Friday, March 30, 2012

Recovery of database - 23 Days?!?

I cancelled an INSERT operation on a database. Now I get:

"Recovery of database 'BounceRepository' (6) is 0% complete (approximately 2024654 more seconds)"

That is another 23 days of straight recover time!!! The original INSERT only ran for about 30 hours.

In Windows Performance Monitor, I can see that the system isn't disk or CPU limited. What is the server doing? How can I make it go faster?Ah, don't get too wired up about the estimated recovery time. That is one of the most ill-tuned features of SQL-2000. I'd guess you have something like 6-8 hours for recovery, depending on what else the box might be doing.

You can make the box go faster right now by carrying it to the nearest window. They accellerate at something like 9.8 m/sec once they go out the window, so it doesn't take very long before they are going pretty fast!

To cut down on recovery time in the future, you want to set the recovery interval (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/adminsql/ad_config_70ry.asp) down a bit.

-PatP|||I wish I believed you:

At 10:30 AM, it estimated 0% complete and 2039505 seconds left
At 5:57 PM, it estimated 1% complete and 2011108 seconds left.

So far, that's pretty accurate.

This system isn't being used by anything else at all. It's completely off-line until it is recovered. I don't understand why it isn't disk bound or CPU bound. What could it be waiting for?

Secondly, from the MSDN link you posted:
"recovery interval does not affect the time it takes to undo long-running transactions."|||Unfortunately, database recovery is one of those "black box" processes which are undocumented. And sometimes what actually seems to be happening is that the data structures are hosed and the recovery engine is simply running around in a loop.

Usually these problems occur in a database that is very highly indexed, and the computer is blindly trying to update each and every index with each and every insert. With millions of records that could easily take ... well ... a month.

What you really need to achieve here is to be able to get back to your data, and to destroy all of the indexes. Bludgeon your way back to a point, somehow someway, where you just have the data and no indexes.|||Actually, it's 9.8 meters/second^2

Physics scales well even past the Enterprise application level and clear up to galactic, where performance starts to suffer.|||Actually, it's 9.8 meters/second^2

Physics scales well even past the Enterprise application level and clear up to galactic, where performance starts to suffer.Hey! Somebody et my
(a superscript 2). I'm bummed.

-PatP|||[THUD (Falling off barsto...office chair)]
[QUOTE=RogerWilco]The original INSERT only ran for about 30 hours.
[QUOTE]

You're joking...

[/THUD (Falling off barsto...office chair)]|||He's not joking. But while Roger has posted several questions on this forum regarding how slow his queries are running, I don't recall him answering questions about how his tables are indexed. Really, that's the first thing to look at. Insufficient indexing or over-indexed tables can severly impact performance.|||F that

Anything taking 30 hours is ridiculous...

Go with minimally logged operations...select * INTO

bcp

BULK INSERT

Samll batches with commits...

anything...|||He's not joking. But while Roger has posted several questions on this forum regarding how slow his queries are running, I don't recall him answering questions about how his tables are indexed. Really, that's the first thing to look at. Insufficient indexing or over-indexed tables can severly impact performance.At this point I really don't think that Roger wants help, it seems that he just wanted a place to vent.

-PatP|||Well, he has plenty of time do that!

Sorry Roger. I do feel for you, and best of luck getting everything back up and running.|||Thank you Pat, blindman, and Brett! The three of you have been very helpful in answering several of my queries. Believe me, I seriously want help!!! Although, I do sometimes cross into venting. You can understand my frustration; 23 day+ recovery times and 50 hour+ long (and running) UPDATE commands with no way to get progress or safely abort!??!

I'm have a very extensive programming background and I know lots about data structures and databases but I readily admit that my database expertise is inadequate for the tasks that I am faced with.

Any of you three interested in moonlighting doing hourly contract work? Over phone/Internet is fine but would any of you by chance live in Texas? Email me rate information and the times and number of hours you'd be able to work at wilco909-at-hotmail.com.

fyi, the recovery process is still running and so far the estimates are scarily accurate:
Estimated at 4% complete and 1946255 more seconds.|||When Roger says that he "cancelled" the insert, I take it he simply killed the SQL Server service, because I don't see any reference to something that would indicate a "kill" being issued against a spid. This will yield a VERY long recovery process. To avoid it, you can detach the database, rename the original log file (I assume it's in hundreds of gigs by now), and then use CREATE DATABASE ... FOR ATTACH.

Let us know how it went.|||Where in Texas?|||I don't recall him answering questions about how his tables are indexed. Really, that's the first thing to look at. Insufficient indexing or over-indexed tables can severly impact performance.

I do appreciate the responses and hate to not get back to you on things like that.

I did a BULK INSERT from a flat file into a basic non-indexed table. And then did a INSERT INTO SELECT into the following table. I intended to cancel and rerun as a DTS package and now the cancel is expected to take weeks!!!

CREATE TABLE Removables
(
EmailUser VARCHAR(50) NOT NULL,
EmailDomain VARCHAR(50) NOT NULL,
BounceCategoryID INTEGER NOT NULL,
EntryDate DATETIME DEFAULT getdate() NOT NULL,

CONSTRAINT PK_Removables PRIMARY KEY CLUSTERED (EmailUser, EmailDomain)
)
ALTER TABLE Removables WITH CHECK ADD CONSTRAINT FK_Removables_BounceCategoryID FOREIGN KEY (BounceCategoryID) REFERENCES BounceCategories|||When Roger says that he "cancelled" the insert, I take it he simply killed the SQL Server service, because I don't see any reference to something that would indicate a "kill" being issued against a spid. This will yield a VERY long recovery process. To avoid it, you can detach the database, rename the original log file (I assume it's in hundreds of gigs by now), and then use CREATE DATABASE ... FOR ATTACH.

Let us know how it went.

I ran the INSERT INTO SELECT operation directly in Query Analyzer. After approximately 30 hours I hit the cancel button and intended to rerun as a DTS transformation rather than an INSERT. This took a while and then we rebooted the server. Got the recovery message, rebooted again, and it's still there. I still find it odd that it isn't CPU or disk or network bound; usually that means it isn't busy. The server is extremely unresponsive. I can't even launch Query Analyzer and connect to it and run basic SQL commands.

Will my data still be instact if I disconnect? I may not even be able to disconnect the database since I can't even connect to the server.

I'm in Austin, TX btw|||Why didn't you just BULK INSERT into that table in the first place...|||Why didn't you just BULK INSERT into that table in the first place...

The final table already has lots of necessary data in it and can't contain duplicates.

So I did a BULK INSERT into a temporary table and then wrote a INSERT INTO SELECT ... that did the dedup/merge.|||I'm in Austin, TX btwToo bad this wasn't a problem a year ago. Joe Celko is one of the best in the business at organizing/solving large database problems, and a year ago he lived in Austin!

-PatP|||Too bad this wasn't a problem a year ago. Joe Celko is one of the best in the business at organizing/solving large database problems, and a year ago he lived in Austin!

-PatP

Just googled him. Sounds great but seems like he would be hard to get.

I take it you guys have enough crises on your hands?|||I'd die of boredom if things ever settled down. Fortunately, there is little chance of that ever happening at the orifice, and even less at home.

I used to think that I knew what "busy" was, but since mid-April we've stepped up to a whole new level.

I still make a few minutes now and then to check here to see what is happening. So many times people just need a nudge in the right direction and what seemed like a catastrophe becomes just another "oh yeah, I remember that" kind of events.

-PatP|||The db will go through a full recovery process (someone said here that it's a mystery, - no mystery here, committed trans get rolled forward, uncommitted - rolled back)

At this point you need to
- stop the service,
- rename both data and log files,
- and restart the service.

The db will be marked suspect.
- Drop it,
- rename the data file back to the original name,
- issue CREATE DATABASE...FOR ATTACH.

The log file will be created since none is specified.|||There may be areas to improve your INSERT INTO SELECT statement. Are you trying to dedup and insert all at once? You know, insert into where not exists? Indexes on the tables could be critical for this.

You might be better off removing the duplicates from your staging table first.

delete from staging inner join production on staging.key = production.key

...and then doing your insert.

Post your INSERT INTO SELECT code.|||At this point you need to
- stop the service,
- rename both data and log files,
- and restart the service.

The db will be marked suspect.
- Drop it,
- rename the data file back to the original name,
- issue CREATE DATABASE...FOR ATTACH.

The log file will be created since none is specified.

I did:

CREATE DATABASE BounceRepository
ON PRIMARY (FILENAME = 'E:\SQLServerData\BounceRepository.mdf')
FOR ATTACH

That resulted in:
Could not open new database 'BounceRepository'. CREATE DATABASE is aborted.
Device activation error. The physical file name 'D:\SQLData\MSSQL\Data\BounceRepository_Log.LDF' may be incorrect.

Obviously I renamed the log file so it wouldn't simply resume the recover. What can I do?|||Run EXEC sp_attach_single_file_db 'BounceRepository',''E:\SQLServerData\BounceReposi tory.mdf'

Feel for you man.|||Run EXEC sp_attach_single_file_db 'BounceRepository',''E:\SQLServerData\BounceReposi tory.mdf'

I ran that and got the same error:
Device activation error. The physical file name 'D:\SQLData\MSSQL\Data\BounceRepository_Log.LDF' may be incorrect.

I tried making a dummy database, detaching it and using that log file. I was hoping to fool it into using just a blank transaction log that didn't need to be restored. I got the error:
Cannot associate files with different databases.|||OK, I guess you need to specify a log device when using CREATE DATABASE...FOR ATTACH. Anyway, I detached PUBS database, deleted the log file, and ran the following from QA:

CREATE DATABASE pubs
ON
( NAME = pubs_data,
FILENAME = 'e:\sql\datadevices\pubs.mdf',
SIZE = 10,
MAXSIZE = 50,
FILEGROWTH = 1MB )
LOG ON
( NAME = 'pubs_log',
FILENAME = 'e:\sql\logdevices\pubs_log.ldf',
SIZE = 5MB,
MAXSIZE = 25MB,
FILEGROWTH = 1MB )
for attach

PUBS.MDF file was the original, while PUBS_LOG.LDF got created from scratch (I guess that's what I meant.)|||OK, I guess you need to specify a log device when using CREATE DATABASE...FOR ATTACH. Anyway, I detached PUBS database, deleted the log file, and ran the following from QA:

I tried the following where the log file didn't exist:

CREATE DATABASE BounceRepository
ON
( NAME = bounce_data,
FILENAME = 'E:\SQLServerData\BounceRepository.mdf',
SIZE = 10,
MAXSIZE = UNLIMITED,
FILEGROWTH = 10% )
LOG ON
( NAME = 'bounce_log',
FILENAME = 'D:\SQLData\MSSQL\LOG\BounceRepository.ldf',
SIZE = 5MB,
MAXSIZE = UNLIMITED,
FILEGROWTH = 10% )
for attach

And I get:
Device activation error. The physical file name 'D:\SQLData\MSSQL\LOG\BounceRepository.ldf' may be incorrect.

This stuff is tough!! Thanks for taking the time to help|||I looked up sp_attach_single_file_db, and found:

When sp_attach_single_file_db attaches the database to the server, it builds a new log file and performs additional cleanup work to remove replication from the newly attached database.

Used sp_attach_single_file_db only on databases that were previously detached from the server using an explicit sp_detach_db operation.

So I assume you can recreate the log if you've gracefully detached the database rather than just stopping the service.|||Post your INSERT INTO SELECT code.

My plan was to rerun this in a DTS transformation rather than an INSERT statement. Then the abort on this INSERT turned fatal...

INSERT INTO Removables (EmailUser, EmailDomain, BounceCategoryID, EntryDate)
SELECT dbo.getEmailUser(OldRemovables.EmailAddress) AS EmailUser
, dbo.getEmailDomain(OldRemovables.EmailAddress) AS EmailDomain
, 8 AS BounceCategoryID
, getdate() AS EntryDate
FROM OldRemovables
WHERE NOT EXISTS (SELECT * FROM Removables WHERE Removables.EmailUser = dbo.getEmailUser(OldRemovables.EmailAddress) AND Removables.EmailDomain = dbo.getEmailDomain(OldRemovables.EmailAddress))
GROUP BY dbo.getEmailUser(OldRemovables.EmailAddress), dbo.getEmailDomain(OldRemovables.EmailAddress)|||To summarize:

If you have a healthy, detached database, you can recreate the database from just the data file and SQL Server will create a new log file and there will be no lengthy recovery procedure. This can be done by several similar commands:

CREATE DATABASE ... FOR ATTACH
sp_attach_db
sp_attach_single_file_db
and a few more

However, apparently, you can't simply, stop SQL Server, steal the the data files from a "recovering database", and try to create a new database from that data file. It simply refuses to use the data file without the corresponding log file. Trying to fool SQL Server in to using a log file from a new database or a different database doesn't work.

Additionally, while the database is recovering, SQL Server is completely unusable. Enterprise Manager, Query Analyzer, and bcp all just permanently hang when I try to use them. If I could get some data out of this database, I could blow it away and recreate and load everything. I have scripts to create all the table structure and stored procs.

Is there anything I can do at all?|||I'd byte the bullet at this point, and call in the "big guns". MS-PSS (http://support.microsoft.com/default.aspx?scid=fh;en-us;Prodoffer41a&sd=GN) is Microsoft's Professional Support Services, they are pricey (ranging from $99 for 24 hour support, through $2,000 for full blown server support, with many steps in between), but they can bring serious support to bear on problems that the web just can't match.

-PatP|||I don't have a 70G-database handy that I can try to mess up so bad at this point, but I did it with a 10G db (good thing I have 160G external drive ;))

I modified the startup parameter for MSSQLSERVER to NOT start at system startup.

I made a copy of the original by stopping the service, copying the files to different names, and then starting the service and using sp_attach_db.

Then I unloaded one of the largest tables (only 10.5 mil records), dropped all related constraints, truncated the table, and fired BULK INSERT with no TABLOCK or BATCHSIZE parameters.

Half way through I (please don't try it at home, I got very nervous) unplugged the server from the power....

Then I plugged and Prayed...Win came up OK (2KEE).

Then I renamed the log file of the database (it used to be 1024K, now it's 4096K, growth was 100M increment)

Then I renamed the data device and started the service. The database was suspect (of course), and I dropped it. Using sp_attach_single_file_db I successfully attached it, and it gave me a "Device activation error" on the old log file, and created a new one.

Then I detached the database and copied the log device from DataDevices to LogDevices (I just like it this way ;)) and used sp_attach_db to specify new location for the log.

I am pretty sure that I could have easily used CREATE DATABASE...WITH ATTACH and achieved pretty much the same thing.

BTW, the table that I truncated at the beginning of all this contained no data, which is not surprising because the entire load was viewed as 1 transaction, and since the new log device contained no reference to this transaction, - it got lost.|||I don't have a 70G-database handy that I can try to mess up so bad at this point, but I did it with a 10G db (good thing I have 160G external drive ;))

Thanks for the extra tips. But, the problem seems to be resolved (read below). Do let me know if you would be interested in contract work in the future. We have some pretty bold plans for an automated, complex, high volume database driven system and could use an advisor on board with the database experience that my team is currently lacking.

After digging through the SQL Server error log, I discovered that this server was running slowly since it was running SQL Server 2000 Personal Edition. I'm somewhat embarassed.

I didn't install SQL Server on this system and never even considered the possibility that someone would have put Personal on a high volume server.

Anyway, with Enterprise Edition (Standard probably would have sufficed), the recovery time dropped from 23 days to 26 hours. That is tolerable. And the system is disk bound which makes sense (previously, it was seemingly idle). Systems should always be either running under capacity or resource bound... Or running crippled software.

Finally, the database crises are seemingly resolved :)

After four 7 day/60-65 hour weeks, I'm flying out of town tomorrow morning for a much needed three day weekend :)

thanks again, everyone!|||Glad to hear that your troubles are gone, hopefully for good (yeah, right, like this is gonna happen to any of us ;))

Next time, if you need to perform such large inserts/updates, consider taking it out of the Transact-SQL direct realm and take into account other SQL-related utilities (BCP, BULK INSERT) You may find the task much easier to accomplish.

You can send me a private message or reply to my email in regards to future engagements.|||After digging through the SQL Server error log, I discovered that this server was running slowly since it was running SQL Server 2000 Personal Edition. I'm somewhat embarassed.

WOW. :) That makes a lot of things you have been saying on here suddenly make sense. lol

No comments:

Post a Comment