Analyzing sleep data from Oura ring

The Oura Cloud website permits a user to download her/his raw data for analysis. I thought that the Oura app was recommending a bedtime that was too early and wanted to analyze the data myself. I downloaded my data to a spreadsheet and uploaded it to statistical analysis software (SPSS v25). I learned that the app was correct. On those nights when I went to bed during the recommended window, I had a better balance between REM sleep and deep sleep (chart) and higher sleep efficiency (not shown).


chart of bed time versus deep and REM sleep

It is possible to use the Oura Cloud trends feature to analyze relationships between variables over time without downloading data. For example, I was able to see the relationship between bedtime and getting deep sleep. In the chart below the purple area is the time I went to bed and the red line is the hours of deep sleep I got. When I went to bed late (April) I got less deep sleep. When I went to bed at the recommended time (June), I got more deep sleep.

chart comparing bed time to deep sleep over time

Using the ring’s data, I have reduced my sleep onset insomnia and my sleep maintenance insomnia. Unfortunately, it seems to have led to early awakenings. That is, I sleep more restfully and so wake-up earlier. I feel great and ready to get up, but I wish I could get a full 7 hours of sleep. I don’t need to wake up at 5:30 or 6:00 am. I suspect that more daytime activity and a darker room would help, but those are changes I am finding it hard to make. I am going to experiment with a breathing meditation when I wake up too early and see if that helps me relax and go back to sleep.

When I was looking for information on how to encourage sleeping later, I stumbled onto yet another review of the Oura ring. The physician concluded that the ring was “hype” because his perception of his sleep upon awakening did not match the ring’s assessment of sleep stages. He did not compare the ring’s assessment to another consumer device, to a clinical device, or to a personal sleep journal. He just dismissed it, when it did not match his perception.

Accuracy is a relative thing when you are dealing with perceptions, so it would make more sense to test the helpfulness of the sleep tracker by making changes in behavior and looking at their impact on the perception of sleep, the ring’s metrics, and other measures of sleep (from another device or a journal). 

He mocked the readiness score, but indicated that he did not look at variables other than sleep upon which the readiness score is based (saving that for a future review). Nor did he look at whether the ring was hype for all metrics or just for sleep staging.

He did present data from a systematic study of the ring’s performance. He wrongly concluded that a 51% to 65% agreement between the ring and polysomnography was simple chance agreement. He pointed out that the ring over or under estimated sleep stages for some individuals, as though he expected there to be no error in measurement at all with a ring. However, the range of over and under estimation seen in the study was typical of medical research based on correlations and averages. He did not present other variables measured in the study, which did match polysomnography (e.g., sleep onset latency, total sleep time, and wake after sleep onset). An unfortunate oversight.

He did not discuss the real problems with the study. They are the problems that characterize most medical research. For example, the participants slept in a laboratory, so their sleep may not represent normal patterns of sleep, thus reducing the predictive value of the ring’s algorithms. Polysomnography is notorious for being riddled with artifacts, unless the technician is highly experienced, collects data for at least three nights, and uses participants who are not restless during sleep. I am not saying that medical research is not valuable, only pointing out that the tools used to study human physiology and behavior are imperfect. Correlations between different tools are rarely high. It is not a simple process to validate tools and metrics. It certainly requires more than a comparison to perception.

Contrast his review with this review in which the author used a wrist-worn sleep tracker and experimented for months with changing his lifestyle to find ways to improve his sleep performance. He kept track of the data and used it along with his perception to evaluate the impact of improved sleep on his quality of life. This approach seems more productive and in the long run does more to validate tools and metrics. Wish he had used an Oura ring.

It would be lovely to have sufficient time and money to use a case-study approach with a variety of tools and collect case-studies from a large, random sample of people. That would give a good idea of the value of tracking sleep and which devices and apps do it most effectively. In the absence of sufficient resources, reviewers of consumer devices need to invest in systematically evaluating and reporting their lived experiences.

For a another review of the ring that addresses the same issues in a more balanced and thoughtful way, read this one.

Comments