StorageSwitched!

Fixed Content Fixation

What were they thinking?

OMG (in heavy valley girl slur).

I was talking with a friend last week about an it shop that sent tapes to an off-site vendor to be ‘tested’ for a possible migration. Somehow one of the tapes was lost. Who knows how - it doesn’t matter. What does matter is the test data was real data, and it was the only copy of the data.

How could they send their primary copy of data, without any backups anywhere?

Ok data protection 101. A backup is a copy of the data on separate media from the primary, preferably in a separate location. Nuff said.

February 14th, 2008 Posted by Clark Hodge | Uncategorized | no comments

A change to the way I work

And if this isn’t another case of ‘lock-in’…

I’m having issues still getting moving my entire environment over to Linux. After a big crash over Thanksgiving (the bloat must’ve finally killed my windows) I made the decision to move to Ubuntu.

When I crashed - windows couldn’t do anything. I then used a LiveCD version of UBUNTU, and was able to see all of my data, and back it up - to the point of the crash. So, I was happy - absolutely no data loss.

But, windows couldn’t repair the problem. Tried all of the win procedures and utils and nothing. Couldn’t see the data, couldn’t fix anything. Dead. Only solution, scrap the drive and reload everything.

But wait - there was nothing wrong with the content. Just the OS. So… I decided if Win couldn’t help me - this would be the last time that I’d be stuck in this boat.

Sorry, long story. I’m trying to move over my apps/data to Ubuntu. However, even with WINE (the windows emulator) lot’s of things don’t work like you’d expect. So, I’m trying to go native linux apps where ever possible. Blogging has been one of those things where my blogging app (qumana) didn’t run on the first pass, and I haven’t had time to make it work.

(Note to the WINE community: Thanks, you folks are great! You’ve been very supportive and interested in helping even us newbies get things working).

Hope to have something running soon, and a more frequent poster again.

I see the Army is doing the same thing. Read the comments - some interesting notes on ‘perspective’. Army moving to linux

What would you do if you had to move operating systems, (or even just apps)? Would your data survive? If not, maybe now is the time to think about it - so you don’t have to do it during a state of emergency - like me.

February 6th, 2008 Posted by Clark Hodge | Uncategorized | one comment

Nobody knows it yet,

… but storage of the future, regardless of media will be object oriented.

Most people don’t realize it yet, but the future of data storage is object based.  It has to be.  It provides encapsulation, location independence, media independence, object attributes - such as authenticity.  It maintains all the ‘old’ style attributes of date/time/name/logical location, but extends that with alternate logical locations, additional data, variable metadata (some systems) fixed metadata.  You can move things around, and not have to tell the user nor application.

The latest storage vendor to join with an object oriented storage systems  is Sun Microsystems.  This week they announced the Sun/STK 5800, (or as known by it’s much cooler engineering moniker - Honeycomb).   Others that play in this space include EMC Centera, Caringo CASStor, HDS HCAP, and NexSAN Assureon.

Sun has added a new component to the object oriented mix. Intelligence.  Intelligence inside the storage that is.  They’ve opened up the storage so that not only does it store your data, it is capable of doing things with it.  Keep an eye out as this new component will get very interesting.  Automated protections, encryptions, extractions, modifications, transformations, reformating etc can all come from the storage maximizing network bandwidth and server processing power.  It’s one step closer to ’smart objects’, a concept that is the next step in data storage.

I don’t know what brand of storage there will be in the future, but I do know it will be object oriented!

..clark hodge

November 14th, 2007 Posted by Clark Hodge | Uncategorized | one comment

Keep your ‘dead’ machines

I’ve argued before about the need to keep all of your information, forever.  Here’s an interesting blog article by W. Lawrence Wescott II on good reason to keep even your broken data storage / computers:

  "Producer’s destruction of “crashed” computer leads to sanctions".

Not even a dead drive is safe harbor any longer…

November 12th, 2007 Posted by Clark Hodge | Uncategorized | no comments

Simulation for enhancing long term thinking

I’ve always been a big fan of simulation.  Simulations can be great  tools for a variety of functions.

Simulations enable us to become more skilled than was possible without them.

As my kids grow up, I’m watching with interest the skills that they gain with simulation.  When they were four or so, they played with a bulldozer simulator.  No, not one for real bulldozers, but one targeting the ‘Tonka’ kids.  They had to drive around using not arrow keys, nor a joystick, but dual stick controls - like a real bulldozer.  As they’ve grown - I watch the skill with which they interact with more advanced simulations.  Kids now utilize a variety of simulations (race car simulations, flight simulators, zoo management, amusement park management, life simulators, sports simulations etc.) to immerse themselves into worlds in a safe and educational environment.  What’s interesting to me - is how seriously they take these simulations.  They fly with finesse, where I, on the other hand, know that I can’t die, so I fly full throttle everywhere.  I don’t even bother landing - I just crater my craft.

I’ve got a close friend that taught himself to fly remote control (RC) helicopters with a simulator.  Knowing the probability of expensive failure - he spent literally hundreds of hours on the simulator - perfecting his skills before going outside.  Once outside - he quickly proved himself as a top-notch RC pilot. 

TED recently posted video of Will Wright.  (wonder if his middle name is ‘Bur’???)  Will talks about his upcoming simulation (game), ‘SPORE‘.  Spore, (not released yet) is a simulation of philosophy, life, evolution, with good bits of physics and sociology mixed in.


Will states that "Computer simulations can re-calibrate your instincts across vast scales of both space and time."  These simulations help us think through long term scenarios, because they can ‘compress’ our world. If you could simulate your business and data environment and see the effects of different policy decisions wouldn’t you be a better information manager? 

Just like how simulations can make people better heavy equipment operators, pilots etc. business and socio-political simulations can make us better stewards of digital assets.  Simulations aren’t ‘perfect’ as they are required to make assumptions. However, I foresee growth in business simulations to aid us in becoming managers.  When we focus more on long term thinking, (as opposed to the Wall Street driven quarterly driven thinking we have now), we’ll make better long term decisions.

November 12th, 2007 Posted by Clark Hodge | Uncategorized | no comments

I’ve got my integer!

2D 2E 11 45 7D 45 C6 3F 3B 9D E3 1D 7D F8 76 32

Is MINE!!!

Ok, sometimes you just miss something in the press, and it really makes you feel out of touch.  About 6 months ago, I missed how the AACS LA (Advanced Access Content System Licensing Administrator, LLC) was going after people for putting ‘their’ integer on websites.

Basically, AACS LA was sending letters to people requesting them to stop posting a certain 128bit integer.  They claimed that they had sole rights to it, even though it was not copyrighted.  The number was an decryption key used in a lot of DVD and BluRay movies.  Sure seems funny to me that:

  • They used the same key for a lot of content.
  • They didn’t protect the key adequately.
  • A lot of really smart people bought into this protection method.
  • They think their lawyers can put the genie back in the bottle.
  • And finally - that they haven’t wised up to the bigger problems of DRM. 

I went ahead and I got my own number - a little worried that they might run out… from Freedom To Tinker.  It’s mine, don’t even think of using it!  Both Freedom To Tinker and Wikipedia have more background.

Seems to me that AACS LA has made a a number of technical, business and social errors. 

So if IBM, Intel, Microsoft, The Walt Disney Company, Warner Brothers Studios, Panasonic, Sony and Toshiba (the founders behind the face of AACS LA) can’t figure out how to protect their content (both the media and the keys), then - how about you?   How do you protect your content? 

November 7th, 2007 Posted by Clark Hodge | Uncategorized | 2 comments

Could you find it?

:-)

No image, just a little ascii art in todays post.   RetroThing  has a nice entry re: a tape recovery from 1982.  That was important historical data on a 9 track tape.  Not sure of the bit rate - back then.  I’m guessing that it was probably 800 or 1600 bpi (thats bits per inch!).  To make it visual - imagine a piece of magnetic tape, about 3/4" by one inch - that was capable of holding 100 to 200 characters (roughly the 1st two sentences of this post (without the underlying hyperlinks).

I wonder what the probability of success on:

  1. finding people that knew enough about the data to have clues as to when / where it was stored
  2. finding the media where the information was
  3. finding a device to read the media
  4. being able to read the media
  5. being able to interpret the data on the media. Actually, this one is old enough - that the encoding was probably pretty straight forward - probably just flat ASCII log file. - no pesky compression/encryption/application encoding going on.

Could you find data from 20 years ago?

October 30th, 2007 Posted by Clark Hodge | Uncategorized | no comments

ILM data loss induced by improper classification

I recently experienced a data loss that cost me an opportunity. No, it wasn’t a disk failure, nor a software data corruption.  It wasn’t a case of inadvertent nor intentional deletion.

As a matter of fact, nothing wrong with the data.  But still I experienced data loss.

The data was just in the wrong place. 

How did it get moved to the wrong place?  Some ILM software decided to ‘move’ my data.  However - it wasn’t the right thing to do.

In this case, it was a spam filtering software that decided (incorrectly) about the status of a piece of email.  It thought that this piece was spam, and therefore it should be moved to my spam folder.  (I’ve been burned by these ILM packages before - so instead just deleting supposed spam - I throw it into a folder - for later manual review). 

However, I only check that folder every month or so.  But the op in this particular email was only good for a couple of weeks.  I lost.

It’s still data loss. If software ‘moves’ the data, so that it is no longer available, nor presented where it should be it’s plain and simple - data loss.  Don’t let it happen to you.

October 29th, 2007 Posted by Clark Hodge | Uncategorized | no comments

Serendipitus world

Actually I just fell serendipitously into a great video post on TED.  I love it because it wasn’t something I was looking for, yet it is interesting,  FUN and applicable.  Fun, not because of what most would consider non-fun subject matter (the workings and future of dictionaries - yawn), but because of the person - Erin McKean, her perspectives, and ENERGY.  She has such a passion that it makes her topic interesting to anyone willing to listen (actually - her style forces your mind to participate intellectually).  I’ll take the time to dig into more of what she posts - because my first uve with her was so enjoyable and stimulating.  For those of you that don’t know, a uve is a defined as follows:

uve n. \ˈüv\  - unidirectional virtual encounter. An engaging experience where the person cannot interact with the producer, yet feels strangely connected.  An intellectual connection where the two are not actually in communication.

 Remember you heard about uves here first!  (Maybe I’ll be famous and get my name in a history of words ‘book’ someday…). 

Engage your brain - check it out!  Come on, it only takes 16 minutes - DO IT!

For me - impassioned people are fun. Erin is definitely that.

So, how’s this relate to my fixed content fixation?   (I did say this was applicable didn’t I?)

I’ve never been a fan of classification (one of the basic tenants of Informatation Lifecycle Management (ILM) (prior post).  It just doesn’t work.

A dictionary in it’s classical sense utilizes a classification system that is archaic - (by language first, then alphabetically). A single classification system that is supposed to handle all words.  But that’s not good enough anymore. Wouldn’t it be handy if it had all the words, and all the character sets.  If I’m looking for a word I heard, or saw, that happens to be in another language - wouldn’t I want a dictionary of all human utterances? Or wouldn’t be nice to have it restrict the information to a specific context.  That way I wouldn’t waste time with all the non-needed, out of context definitions.

So something as basic as a dictionary’s classification system obviously isn’t working.  What makes ’storage professionals’ or self proclaimed ‘informationists’ think they can classify complex information constructs if we can’t even do words?


Two great quotes on the concept of serendipity:

"The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’, but ‘That’s funny …’" Isaac Asimov

"Serendipity is looking in a haystack for a needle and discovering a farmer’s daughter." Julius Comroe Jr.


By the way, Erin’s website is DictionaryEvangelist.  Check it out, and see where it takes you…

October 10th, 2007 Posted by Clark Hodge | Uncategorized | no comments

Back in the swing

I fell out of the swing of things for a bit there - sorry.  So, in the spirit of keeping this fun - check out this site: "The Eight Irrestible Principles of Fun". They too are thinking about long term stuff, but another perspective. It’ll take a few minutes, but you shouldn’t be in so much of a hurry should you? Excerpted from  #5 -

Use your wisdom.  Stop taking it all so damn seriously.

In this moment, 2007, is it a a life or death decision?

In 10 years, 2017, will you remember what you are fretting about?

In 100 years, 2107, will any one care?

So lighten up.  This too will pass.

October 5th, 2007 Posted by Clark Hodge | Uncategorized | no comments

Good Storage FUD

It’s time for a bit of FUD in the storage industry.  No, no, not the normal FUD (that ugly sales tactic of throwing Fear, Uncertainty and Doubt into the customer decision making process).  We’ve already got plenty of that.  I’m talking (ok writing) about Full Utilization of Disk.  Or more informally Fill Up my Disk. 

Unused space on a disk has absolutely no value - it’s just random noise.  So why not use that space intelligently?  FUD is making use of the entire disk - filling it with interesting ones and zeros.

What could go there?

1) Local data backups/snapshots

2) Prior versions of files (how long has it been since we’ve had a decent versioning file system?  For me it was VAX/VMS and Xerox ViewPoint - so roughly 20 years ago.)

3) Intelligent cache of remote things that I might need.

4) Encrypted garbage with misleading file names… (Give the officers something to do, if your drive ever falls into ‘foul’ hands.

At what level should this occur? App/OS/Drive/Utility?  I dunno but somebody ought to do it.  It gets kinda weird if the app does it. One app would probably be very well protected while others wouldn’t see any free space.  (I’ve started using MIRO as my video feed and they grab videos that I may be interested in and will utilize up to a certain threshold of free space.)  If the storage does it, then it’s like any other storage function - it’s harder to control and manage.  If the OS (my favorite choice) does it, then it can control reporting of free space (remember - that’ll be the amount of free space on the drive - before all the non-user initiated FUD).  Thus any FUD protections above and beyond the normal policies would be available for deletion and space reclamation.

There are a few utility programs out there that are in a niche where most people can’t find them / don’t know about them.  They include  Symantec GoBack (which is only available buried with a bundle of other stuff) or Horizon Rollback.  In the enterprise side there are several utilities / appliances that provide ‘continuous data protection’. 

Why not FUD? - well, some would argue about limiting the performance of a drive - well - put the FUD data on the worst performing portions of a drive.  Schedule the FUD filling routines to minimize impact to the system.. FUD is about increased protection and utility. 

By the way - if you implement this, and feel good about better protecting your customers data, why not feel really good, and drop me little royalty check every once in a while?  <wink>  And if you’ve got more FUD ideas or leads on who’s got FUD drop ‘em in the comments.

August 9th, 2007 Posted by Clark Hodge | Uncategorized | 2 comments

Thinking Green…

The whole IT industry is talking green, but it’s not what you think.

My friend and fellow blogger - Amy Gahran turned me onto an interesting ‘green IT blog‘. Lot’s of good info and references to other sites in the posts.  The blog got me thinking.  I like green as much as the next guy, but too many IT vendors are looking at padding their wallets with a different green.  They are working 7×24 trying to convince us to buy their products and save the planet.

One vendor says that my persistent data isn’t important enough for instant retrieval.  If green IT means that I’ve got to wait  minimum of 13 seconds for a simple mouse click to return a file - I’m taking my green elsewhere.  Sorry, but as 21st century information consumers, we’re an impatient bunch. For some backroom operations (i.e. backup) it’s got potential.  Then again, when you have a system down that’s costing real money for every second it’s down - maybe not there either.

Another storage vendor recently spoke of how much energy would be saved if only you’d trade in last years product for their new stuff.  Of course that’s great for the energy side of things, but what about the impact on landfills etc.  The total cost of pre-mature retirement needs to be fully considered, not just the utility costs.

I’m wondering about all the dollars to add an additional tier (class) of storage - the ‘green’ class.  Now we’ve got Tier 1, Tier 2, Tier 3, Compliant, and Green classes of storage.  More or less depending on how you like to slice and dice.  Every new tier, adds complexity and management costs.  And at less than 100% utilization levels - for each additional tier you’re spinning more disk, that isn’t doing anything.

Others are riding on a de-dupe and compression pitch - (less data), but they require extra processing power and the expense for the engines (hardware and software) to perform those processes.

And, once again, people are oversimplifying their information management.  The value of information increases significantly if it is available with integrity and in a timely fashion.  I’ve harped on this before - there is value in information. BUT, only if it is available when users want it.  I’ve spoken with literally thousands of users, and never has one said  that they were willing to wait for their data.  They want it NOW.  IT guys have arbitrarily made those kinds of decisions, often without understanding the implications to costs nor value of the data to the business.

So, before jumping into something that’ll make your vendor money - make sure you’re doing what’s right for your data, your users and your business.

August 3rd, 2007 Posted by Clark Hodge | Uncategorized | 4 comments

Search engine 20 questions.

I just spoke with an old friend Juzar Hasta.  He’s now with Vivisimo.  We had the chance to talk a bit about search engine technologies. They claim to be able to only pull the relevant material on a search.  But how would they know? I’ve got my ideas on search, and I think that search is the wrong analogy.  When you search, you are often alone, un-aided.  Having to make choices on your own, with no guidance.

I want an intelligent interactive conversational search.  Much like I have with people.  For example I don’t want "Results 1 - 10 of about 26,400,000 for pyramid [definition]. (0.05 seconds) ".   (BTW, why would I care how long it took?)

Funny thing, on my search for Pyramid on Google the # 2 listing is a brewery.  I’ll bet they PR people are working hard to knock Wikipedia off the top slot!  And for a global resource based on ‘human interaction’ Google ranks the FDA food pyramid #3.  I’m thinking it’s time that our kids be assigned more homework, and get the pyramids of Giza (currently query result # 9) moved up a little in the Google ratings…

Back to the topic at hand.  I’m more into a conversation.  One of the problems of search - is determining the context of a search.  Why is this so difficult? Context can be easily determined with a simple conversation.

The pyramid I was looking for - was a computer company that I used to work for of San Jose.

If the conversation went something like this:

Me: Pyramid

SearchEngine: So, with 26 million hits, the primary topics are some big honkin structures in Egypt, beer, the FDA food thing and the mathematical structure.  Is it one of those?  If not, can you give me some hints - think time, place, related items.

Me: San Jose

SearchEngine: Ok, I think we’re getting closer.  Less than a million hits. First, you mean "San Jose" - like the place right?

Me: Yup

 SearchEngine: I’ve found a number of businesses in San Jose, and some news articles. If it’s a business, what type of business?  If it’s news related - do you know the time, or anything else about pyramid that might be in the story?  If it’s neither, then I’m going to need another hint.  Maybe is it something related to supernatural activity?

Me: Computer business

SearchEngine: Great - I’ve found a number of references to a Pyramid Technologies, a computer company that used to be in business in San Jose, CA.  Is that correct?

Me: Yes

SearchEngine: Now we’re getting close I’ll bet.  What about that company are you interested in?  I have news releases, old documentation, people that used to work there, archives of the old web-sites, some pictures, public filings…

Me: …

So, it’s kind of like 20 questions.  I don’t expect to hit on the first query, but hey - give me some help and direction.  Or really, the conversation is like what I would have with a librarian, or research assistant. 

And once I’ve found what I’m after, if they could suggest some follow-on information that may be of interest.  One of my favorite things when I’m doing research is not finding what I’m after, (that’s expected), but finding other interesting, somehow related items. So, search engines - help me find what I’m after, and some other good stuff along the way!

August 3rd, 2007 Posted by Clark Hodge | Uncategorized | one comment

When preservation efforts go VERY VERY wrong

Jones Canyon Preservation Catastrophe

Just a few days ago, I was rafting  with my two sons and my father - down Green River with Hatch, one of the oldest rafting companies in the western US.  We hiked to an archeological site in Jones Canyon (in Dinosaur National Monument).  The site was on a trail about 2 miles from the Green River, another 2 miles from the closest road, and several hundred yards from a small stream.  Several distinct societies had occupied this site. 

A number of years ago, archeologists had worked to excavate the site and recover invaluable artifacts.  Towards the end of the work, a flash flood hit, destroying the camp where all of the artifacts had been placed.  Nothing was recovered.  The original site from where they pulled the artifacts was left untouched.  Everyone must have been asking ‘what if…’.

Lessons learned that afternoon:

  • Even with the best of intentions, bad stuff happens.
  • Don’t wait to make backup copies.
  • Don’t put off getting data into other media.
  • Mistakes will be made.  Don’t let that hinder future efforts.
  • Get outside, enjoy what we’ve got, take a vacation, dump the cell phone and the PC.
  • Don’t take anything too seriously - the world will go on.

Note that the above image is not from Jones Canyon, though there were a number of similar pictographs there, I didn’t have a camera with me.  This pictograph is from Newspaper Rock St. Park in Utah.

July 19th, 2007 Posted by Clark Hodge | Uncategorized | one comment

Discension in the readers…

Zatz commented on last weeks posting re: not encrypting anything.  His comment is on the post in it’s entirety, but let me comment on a few things Zatz said.  My point was that we should be ‘thinking’ about what, when, and how we encrypt.  Right now, we are pushing towards lot’s of encryption, but in many cases, that encryption can cause long term, irrecoverable damage to valuable data assets.

There is no such thing as unbreakable encryption. That said, data loss is more likely than a company paying to have industry standard encryption broken for them because they lost their private key.

Understood, but that’s my point - lost keys means lost data.  No algorithm / implementation is unbreakable but the cost and time to break can be very high, so much so that for practical purposes it is unbreakable and subsequently data lost.  That being said, one of the things that always felt was interesting is there is a statistical probability that breaking any encryption could happen on the first try.  Not very likely, but it could.

The solution is to choose the right level of encryption. The main benefit of encrypting backups is (theoretically) avoiding sensitive data disclosure to hackers and thieves. Even when a tape gets stolen or lost, you don’t have to publicly disclose that you lost client data. This benefit can still be achieved with less robust encryption- maybe even something with a known vulnerability.

Ok, here’s where I’d say that the law is uh, stupid misguided.  If there’s no standard for level of encryption, then isn’t by it’s very nature - encoding to ascii a very simple substitution encryption scheme? Ok, that’s weak, how about ROT13?  A little stronger, but not much.

The reality of the situation is that the benefit of putting a heavy encryption system between your data and a hacker is only slightly better than putting a light one. They both force the hacker to become more than casually involved- he has to find a way to read the tape, and analyze the cypher-text to see if it is breakable. I’ll bet that many casual data thieves who look at an encrypted tape probably give up without even trying to break it.

But that security by a little obscurity has never been a safe bet. 

Instead of not encrypting your data, use 64 bit keys on your backups- they’re fairly easy to break if lost, and to the attacker who doesn’t know that it’s 64 bit, it might be just as intimidating as strong encryption. Definitely more intimidating than plain-text…

But not much more so than ROT13.  Is the game intimidation, or protection?  Serious protection would require serious encryption right?

This principal is used by many security conscious organizations- it’s called "pretty good security": good enough to deter all but the most determined, who probably would cost too much to deter anyways.

I guess, IMHO that I’d say that you either go whole hog, or not at all.  If you use weak protections, you’re going to be more likely to have the data broken into.  Maybe less likely than if unencrypted.  But by how much is the question.

While legally, you may not have to disclose a loss if you were using some encryption, wouldn’t having to disclose the media loss, the fact that you used weak encryption that was easily broken, and that you didn’t disclose the loss promptly even more embarrassing and costly. Let’s do more for the protection of data - both long and short term - not less.  Along with the concept of pretty good security - needs to be a concept of ‘good enough security’ - and that’s not just about the encryption.

July 19th, 2007 Posted by Clark Hodge | Uncategorized | one comment

In conflict with my previous post - don’t encrypt anything!

Ok, so this may cause a little consternation with my previous post.  That’s where I encouraged everyone to encrypt everything everywhere.  Now I’ll tell you to not encrypt anything.  I guess the encryption thing is getting a bit too popular - so I’ve got to take the opposite stance.

Rule 1 of IT: NEVER LOSE DATA!

Rule 2 of IT: protect data from all BAD STUFF!

What happens when you lose a key: DATA LOSS.

What happens when you lose access to the program that did the encryption: DATA LOSS.

What happens when there is a programmatic error by the encryption software: DATA LOSS.

What happens when encryption protection doesn’t work as well as it should / as well as the brochure says: potential data leak (read ‘BAD STUFF’).

Encrypting data adds an additional layer of complexity and potential for data loss.  Don’t do it! 

In addition, data, over time, often changes from private and proprietary to free, public and open.  This happens all the time.  If it’s cumbersome or impossible, it may not be reasonable to expect the future data stewards to go through the effort of decrypting the information.  Data stewards, whether they be in IT or records management, must maintain an understanding of the assets they protect, and of  the value they have today, and what the potential future value is.

Data loss, is unacceptable.  Refer to rule 1.

There are ways other than encryption to protect data.  Maybe we should look at those, and determine where our energies should go.  Of course if you don’t encrypt - you’d better have those other methods in place to protect data in it’s native state.  And I know that ‘your’ security systems are flawless right?

July 11th, 2007 Posted by Clark Hodge | Uncategorized | one comment

Sign and Encrypt Everything, Everywhere

A while back I talked about storing everything forever.  Now I want to propose that everything (digital) everywhere be signed and encrypted. 

Everywhere…  in Storage (disks, tapes, sticks…), in Transit (On the network, between devices…).

Encrypting everything everywhere only reduces the threat.  It doesn’t (can’t) guarantee security.  Encryption can be broken, data can be grabbed while it’s unencrypted.    Until operating systems are secure - the more you can do to segregate data the better.  (There’s a whole other reason for avoiding vitualization - but thats another article).

 We should accept as fact that anything not encrypted has a high probability of being released ‘into the wild’.  We should also assume that unsigned information has been tampered with.

Decryption should occur only at the point of need, the signature validated and when the operation on the data is complete - a secure cleansing procedures is executed immediately.  All of this should run within a secure OS (which of course is totally signed/encrypted for integrity) which doesn’t leave things laying around in memory, nor on disk in ‘temp’ files, swap areas, hardware and software caches etc..

Signing everything allows for us to ensure data integrity.  Is it really what it proclaims to be?  If it’s not signed - don’t trust it!  I still don’t understand how an OS allows modifications by unsigned agents.  No wonder things are such a mess.

Sure, it may run a little slower, but what price have we as a society had to pay for the lazy approach the industry has taken to the security of data?  OS’s are hacked, IDs are stolen, web-sites hijacked, data corrupted, personal information compromised etc.  Not taking a proactive step is not free, far from it. 

June 22nd, 2007 Posted by Clark Hodge | Uncategorized | no comments

Who can I trust?

I’m a big believer in encryption. 

But, who do you trust?  Real cryptographers would say don’t trust anyone.  Independently verify trust.  So how do I buy ‘encryption’ products that can’t be independently verified?  Even the security of good encryption algorithms can be destroyed with a poor implementation. The algorithms must be verified and the implementation (actual code) must be openly verified, and independently re-constructable. 

I was recently at EMC World and EMC is now (via the RSA acquisition) smack dab in the middle of the security business.  I had the opportunity to spend some time after one of the sessions speaking with John Linn, a senior consulting architect for RSA.   They don’t release their code openly.  So, can we trust them? 

Sure they claim that they use industry standard / military grade / really good encryption, but unless someone independently verifies not only the encryption used but also the code, then I’m not convinced that it’s secure.  If they claim certification by some other body, how can I trust them?  I’m sure they could be ‘bought’ or coerced. 

With all of the push to encrypt more and more, vendors are working to answer that need.  Unfortunately there aren’t too many real cryptographers out their.  The customer base is not well educated in this either.  This is not an easy problem.  It’s hard enough when the algorithms are in software that runs on standard hardware, but what about when it’s embedded into firmware - specific to a device?  Who’s checking that?

The vendors say ‘trust us’.  I ask ‘why should I?’.

June 12th, 2007 Posted by Clark Hodge | Uncategorized | no comments

Data half useful? And sales double speak to even things up.

The half:

The Register published an article referring to a study that states that half of things stored won’t ever be accessed.  So what?  Unless you can tell me which half won’t get accessed EVER, (and guarantee that), then who cares?

Some might think that if they throw out the half that won’t get accessed they’ll save a ton.  But it’s not a matter of knowing which half won’t be accessed - it’s knowing that you can’t know.

What would be interesting - is with better tools (and tools include the server, the storage, and the applications) would this percentage change?  I would predict that yes - we would do more interesting things with the data.  Search functions have brought out a lot of things that otherwise had been left buried.  Give us more advanced tools and we’ll discover all kinds of ‘nuggets’ in the data that otherwise would go ‘un-accessed’.

Now, the double speak…

From the same article, a vendor sales director states "Even though hardware storage costs per gigabyte are dropping, the costs of managing storage are skyrocketing."  By what metric?  If we’re pushing budgets with management yes, but our cost to manage a byte of information has decreased hugely with better storage, backup software.  Not sure though, maybe those costs are just being recognized as a separate entity now.  But he goes on to say "It makes more sense to move data to lower cost alternatives instead of struggling with storage problems."

Huh?  So, is he implying that the movement of data to those other platforms is free?  Gee, recently I was involved in a data migration that sure seemed pretty costly.  And is he implying that the lower cost alternatives have fewer storage problems and require less management?  Sure looks to me that he’s adding complexity, and additional management costs.

So, if I take have my data and run it through the doublespeak - do I get the newest de-dupe engine?  Help me, I’m so confused…

May 17th, 2007 Posted by Clark Hodge | Uncategorized | no comments

Mashed Records Management?

How do we handle record keeping in time sequenced events from different sources?  The hi-tech industry is all excited about the idea of mashups.  Simply - the putting together of data from multiple sources and repurposing it.

Now, for records management purposes, mashups are going to introduce a number of problems.  Actually, the problems have already existed, but mashups will make them more prevalent.

A few years back, I was working for Xerox on early windowed computing environments.  One of our big pitches was the ‘mashup’. Of course we didn’t have that cool word back then, but the concept was that instead of having a TTY window into the DEC PDP 11/08, a 3270 window into the IBM MVS mainframe, a PC running Lotus 123, and a local workstation with it’s own applications - you’d have a single workstation with ‘windows’ into each of these environments.  For the advanced folks we even had ‘rooms’ of affiliated mashups for the multiple jobs, a multi-tasker does. Personally, I like the term and analogy of ‘rooms’ better than currently in vogue ‘virtual desktops’.

So why are mashups so tough for records managers?  Because we are working in a fast paced multi-tasking world, where not only the is any event important, but it’s sequence in relation to other events is also key.  In current systems time concepts are poor at best.  There is no auditing of ime functionsClocking is poor?  My computers aren’t synced (reliably) well with others, nor are the clocking mechanisms secure.  If I want to fake a time - it’s as easy as changing the clock.  There are no audit trails, nothing to show that the time is accurate.

So, if I have to reconstruct an event, from the records created by different systems around that event., how can I be sure of what happened when, and in what relation to other events?

Think of the steps to make adjustments to the rods in a nuclear reactor.  They are checking multiple systems, validating multiple steps, obtaining a hierarchy of approvals, performing specific sequences of steps, each with it’s own ‘records capture’ function.  Or of an executive selling stock - was it before or after he discovered the impending FDA drug approval.  If I have to go back and reconstruct events from the records how do I deal with mashup data?

Accurate, trusted time has to be part of the records collection process.  NIST offers reliable time, though not perfectly secure.  Canada offers a ’signed’ time service a step up.  But are you using these services?  Few are.  And even fewer have integrated ‘time processes’ into their records management programs. Has it been a problem yet?  No, but as counsel becomes more sophisticated - the simple time stamp on an e-mail will not be deemed credible.

May 11th, 2007 Posted by Clark Hodge | Uncategorized | no comments