REDO……How do you do?????

Its been quite a while since my last post, and I am finally breaking the silence streak. The reason it took me this long would be because I was in a dilemma regarding do I continue the Tuning series from where I left or do I follow up my last post regarding TEMP tablespace configuration with another important database configuration which plays a vital role in performance.
Well I made up my mind and as my blog post headline suggests, this post is going to be about REDO. Quite recently I was in a discussion with one of my LinkedIn friends regarding a problem he had come up with his REDO log files. We shared few information regarding how he had his REDO log files configured like how many groups, how man logs per groups, size of each log and their location, etc. Eventually, the problem was zeroed in and we found that his REDO log files were being switched every 20 minutes. He has his REDO log files configured with 1GB per REDO log file. WOW, that is some good amount of REDO being generated.
From this scenario, we can only conclude that we need to create enough REDO log files with ample space to minimize such rapid log file switches, as it causes the database to wait for the log to switch before completing any transaction that are currently in queue. But how do we know how much REDO is enough? This is not something we can predict beforehand especially for a new system being placed. If we know how much data bandwidth our database uses then we might have some idea but for most of the cases this work post oracle installation/configuration and only once when we are working with the database.
Now then that we have our database in place, we need to have some sort of weapon with us that can tell us how much REDO is being generated, how much log file switches are being occurred. So that we can configure our REDO log files to handle such scenarios and avoid database to wait as much as possible. Below is a sample script that again uses AWR data mining to display information as how much REDO is being generated, by what size should the REDO log files be increased and how much time is it taking to switch the REDO log files whenever it has occurred in the system.
with t_interval as
(
select /*+ inline */ sysdate-30 begin, sysdate as end from dual
)
select
stats.dbid dbid,
stats.instance_number instance_number,
to_char(stats.begin_interval_time, 'YYYYMMDD HH24MI') tim,
stats.cpu_used / 100 cpu_sec,
stats.phy_reads phy_reads,
stats.phy_writes phy_writes,
stats.cr_served cr_served,
stats.current_served current_served,
stats.redo_size / 1024 / 1024 redo_mb,
procs.current_utilization processes,
--
waits.df_seq_micro / 1000 / nullif(waits.df_seq_waits,0) avg_df_seq,
waits.df_scat_micro / 1000 / nullif(waits.df_scat_waits,0) avg_df_scat,
(stats.redo_size - stats.md_redo_size) * 100 / stats.md_redo_size redo_diff_to_md_pct,
stats.redo_write_time*10/stats.redo_writes avg_lfpw,
waits.log_sync_micro/nullif(waits.log_sync_waits, 0) / 1000 avg_log_sync,
waits.log_ckpt_micro/1e6 log_ckpt_sec,
( stats.redo_size /
( waits.snap_interval * 86400 ) ) *
( waits.log_ckpt_micro/1e6 ) / 1024 / 1024 redo_needed,
stats.is_restart
from
(
select
snap_id,
begin_interval_time,
snap_interval,
instance_number,
dbid,
log_sync_micro,
log_sync_waits,
log_ckpt_micro,
log_ckpt_waits,
df_seq_micro,
df_seq_waits,
df_scat_micro,
df_scat_waits,
direct_micro,
direct_waits,
median(log_sync_micro/nullif(log_sync_waits, 0)) over (partition by dbid, instance_number) md_log_sync_micro
from
(
select
snap_id,
begin_interval_time,
instance_number,
dbid,
max(snap_interval) snap_interval,
max(decode(event_name, 'log file sync', wait_micro)) log_sync_micro,
max(decode(event_name, 'log file sync', total_waits)) log_sync_waits,
max(decode(event_name, 'log file switch (checkpoint incomplete)', wait_micro)) log_ckpt_micro,
max(decode(event_name, 'log file switch (checkpoint incomplete)', total_waits)) log_ckpt_waits,
max(decode(event_name, 'db file sequential read', wait_micro)) df_seq_micro,
max(decode(event_name, 'db file sequential read', total_waits)) df_seq_waits,
max(decode(event_name, 'db file scattered read', wait_micro)) df_scat_micro,
max(decode(event_name, 'db file scattered read', total_waits)) df_scat_waits,
max(decode(event_name, 'direct path read', wait_micro)) direct_micro,
max(decode(event_name, 'direct path read', total_waits)) direct_waits
from
(
select
e.snap_id,
e.instance_number,
e.dbid,
sn.begin_interval_time,
cast(begin_interval_time as date) - cast(lag(begin_interval_time) over (partition by e.dbid, e.instance_number, e.event_name order by sn.begin_interval_time) as date) snap_interval,
sn.startup_time,
e.event_name,
case when (sn.begin_interval_time >= sn.startup_time and lag(sn.begin_interval_time) over (partition by e.dbid, e.instance_number, e.event_name order by sn.begin_interval_time) = sn.startup_time and lag(sn.begin_interval_time) over (partition by e.dbid, e.instance_number, e.event_name order by sn.begin_interval_time) = sn.startup_time and lag(sn.begin_interval_time) over (partition by stats.dbid, stats.instance_number, stats.stat_id order by sn.begin_interval_time) = sn.startup_time and lag(sn.begin_interval_time) over (partition by stats.dbid, stats.instance_number, stats.stat_id order by sn.begin_interval_time) < sn.startup_time)
then 'Yes'
end is_restart
from
dba_hist_sysstat stats,
dba_hist_snapshot sn,
t_interval t
where
sn.snap_id = stats.snap_id and
sn.dbid = stats.dbid and
sn.instance_number = stats.instance_number and
sn.begin_interval_time between t.begin and t.end and
stats.stat_name in (
'redo size',
'redo write time',
'redo writes',
'CPU used by this session',
'physical read total IO requests',
'physical reads cache',
'physical write total IO requests',
'physical writes from cache',
'gc cr blocks served',
'gc current blocks served'
)
)
group by dbid, instance_number, begin_interval_time, snap_id
)
) stats,
(
select
stats.snap_id,
stats.instance_number,
stats.dbid,
stats.resource_name,
stats.current_utilization
from
dba_hist_resource_limit stats,
dba_hist_snapshot sn,
t_interval t
where
sn.snap_id = stats.snap_id and
sn.dbid = stats.dbid and
sn.instance_number = stats.instance_number and
sn.begin_interval_time between t.begin and t.end and
stats.resource_name = 'processes'
) procs
where
waits.dbid = stats.dbid and
waits.instance_number = stats.instance_number and
waits.snap_id = stats.snap_id and
waits.dbid = procs.dbid and
waits.instance_number = procs.instance_number and
waits.snap_id = procs.snap_id
order by
stats.dbid, stats.instance_number, stats.begin_interval_time
;

In the output, the column REDO_MB is the amount of redo log being generated, the column REDO_NEEDED is the amount of additional REDO size needed to avoid log switch and AVG_LOG_SYNC is the average time taken to switch the REDO logs. The query also provides additional information such as CPU utilization, physical I/O, etc which all serves in finding the right combination of REDO log file side, number and groups to create optimal database system to perform in the most optimize way possible.
That is all for this time. Hope to see you all soon with my next post. Feel free to drop in comments or let me know if you want me to post on something.

This entry was posted in Best Practices, Oracle and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s