• AIPressRoom
  • Posts
  • Multivariate Time-Sequence Prediction with BQML

Multivariate Time-Sequence Prediction with BQML

Final winter, I gave a presentation on ‘Extra predictable time-series mannequin with BQML’ at GDG DevFest Tashkent 2022 in Tashkent, the capital of Uzbekistan.

I used to be going to share a number of the materials and code after DevFest I used within the presentation, however time has handed, and new options have been launched in BQML that overlap a number of the content material.

Due to this fact, I’ll as an alternative point out the brand new options and a number of the issues which might be nonetheless legitimate briefly.

Time collection knowledge is utilized by many organizations for quite a lot of functions, and it’s vital to notice that “predictive analytics” is all concerning the “future” in time. Time collection predictive analytics has been used within the brief, medium, and long run, and whereas it has many inaccuracies and dangers, it has additionally been steadily bettering.

Since “prediction” appears to be so helpful, you is perhaps tempted to use a time collection prediction mannequin when you have time collection knowledge. However time collection prediction fashions are normally computationally intensive, and when you have a number of knowledge, it is going to be extra computationally intensive. So it’s cumbersome and arduous to course of it, load it to the analytics atmosphere and analyze it

In case you are utilizing Google BigQuery for knowledge administration, you should utilize BQML (BigQuery ML) to use machine studying algorithms to your knowledge in a easy, simple, and quick approach. Lots of people use BigQuery to course of a number of knowledge, and a number of that knowledge is commonly time collection knowledge. And BQML additionally helps time collection fashions.

The idea of the time collection mannequin presently supported by BQML is the AutoRegressive Integrated Moving Average (ARIMA) mannequin. The ARIMA mannequin predicts utilizing solely present time collection knowledge and is understood to have good short-term prediction efficiency, and because it combines AR and MA, it’s a widespread mannequin that may cowl a variety of time collection fashions.

Nonetheless, this mannequin is computationally intensive general, and because it solely makes use of time collection knowledge with normality, it’s tough to make use of it in circumstances with tendencies or seasonality. Due to this fact, ARIMA_PLUS in BQML contains a number of extra options as choices. You possibly can add time collection decomposition, seasonality elements, spikes and dips, coefficient modifications, and extra to your mannequin, or you may undergo them individually and manually alter the mannequin. I additionally personally like the truth that you may alter for periodicity by robotically incorporating vacation choices, which is among the advantages of utilizing a platform that doesn’t require you to manually add informations associated to dates.

You possibly can discuss with this page for extra data.

Nonetheless, in the case of real-world functions, time collection prediction isn’t so simple as this. After all, we’ve been capable of establish a number of cycles and add interventions to a number of time collection with ARIMA_PLUS, however there are a lot of exterior elements associated to time collection knowledge, and solely only a few occasions occur in isolation. Stationarity could be arduous to seek out in time-series knowledge.

Within the unique presentation, I checked out the right way to take care of these real-world time collection knowledge for making prediction model- to decompose these time collection, clear up the decomposed knowledge, import it into Python, after which weave it with different variables to create a multivariate time collection operate, estimate causality and incorporate it right into a prediction mannequin, and estimate the diploma to which the impact varies with modifications in occasions.

And in the one previous few months, a brand new characteristic for creating multivariate time collection capabilities with exterior variables(ARIMA_PLUS_XREG, XREG beneath) has turn into an outright characteristic in BQML.

You possibly can learn all about it here(it’s in preview as of July 2023, however I’m guessing it’ll be out there later this 12 months).

I apply the official tutorial to see the way it compares to a conventional univariate time collection mannequin and we are able to see the way it works.

The steps are the identical as within the tutorial, so I gained’t duplicate them, however listed below are the 2 fashions I created. First, I created a conventional ARIMA_PLUS mannequin after which an XREG mannequin utilizing the identical knowledge however including the temperature and wind velocity on the time.

# ARIMA_PLUS

# ARIMA_PLUS
CREATE OR REPLACE MODEL test_dt_us.seattle_pm25_plus_model
OPTIONS (
 MODEL_TYPE = 'ARIMA_PLUS',
 time_series_timestamp_col="date",
 time_series_data_col="pm25") AS
SELECT date, pm25
FROM test_dt_us.seattle_air_quality_daily
WHERE date BETWEEN DATE('2012-01-01') AND DATE('2020-12-31')
#ARIMA_PLUS_XREG
CREATE OR REPLACE  MODEL test_dt_us.seattle_pm25_xreg_model
 OPTIONS (
   MODEL_TYPE = 'ARIMA_PLUS_XREG',
   time_series_timestamp_col="date",
   time_series_data_col="pm25") AS
SELECT  date, pm25, temperature, wind_speed
FROM test_dt_us.seattle_air_quality_daily
WHERE  date BETWEEN DATE('2012-01-01') AND DATE('2020-12-31')

A mannequin that makes use of these a number of knowledge would look one thing like this

Two fashions are in contrast with ML.Consider.

SELECT  * 
FROM  ML.EVALUATE
         (  MODEL test_dt_us.seattle_pm25_plus_model, 
         (  SELECT  date,  pm25
           FROM  test_dt_us.seattle_air_quality_daily 
           WHERE  date > DATE('2020-12-31')  ))
SELECT  * 
FROM  ML.EVALUATE
          (  MODEL test_dt_us.seattle_pm25_xreg_model, 
          (  SELECT  date,  pm25,  temperature,  wind_speed 
             FROM  test_dt_us.seattle_air_quality_daily 
             WHERE  date > DATE('2020-12-31')  ),
          STRUCT(  TRUE AS perform_aggregation,  30 AS horizon))

Outcomes are beneath.

ARIMA_PLUS

ARIMA_PLUS_XREG

You possibly can see that the XREG mannequin is forward on fundamental efficiency metrics similar to MAE, MSE, and MAPE. (Clearly, this isn’t an ideal answer, data-dependent, and we are able to simply say that we obtained one other great tool.)

Multivariate time collection evaluation is a much-needed choice in lots of circumstances, but it surely’s usually tough to use because of varied causes. Now, we are able to use it if the explanations are in knowledge and evaluation steps. It seems like we’ve got a very good choice for that, so it’s good to find out about it and hopefully it is going to be helpful in lots of circumstances.  JeongMin Kwon is a contract senior Information Scientist in 10+ years of hands-on expertise leveraging machine studying fashions and knowledge mining.