Brainfart Saturday

I need to switch coffee brands…

Yesterday, for no apparent reason, I thought it may be a good idea to create a file transfer app that will asynchronously send files in slices(chunks) and still ensure the receiving party’s checksum will mach the sender’s. I was planning on adding public key security to the whole thing, but I can’t seem to get past step 1 without issues.

I tried testing splitting a file into slices and merging them immediately after and it seems to work just fine for small files.

Blob blob = FileUtils.GetBlob("C:\\Users\\Portable\\Downloads\\smalldoc.pdf");

FileUtils.SplitSlices(ref blob);

// Change the filename
blob.Path = "C:\\Users\\Portable\\Downloads\\smalldocCopy.pdf";
// Merge the slices back into one under the new filename
FileUtils.MergeSlices(ref blob);

The head-scratching starts when splitting and merging a large-ish file (50Mb+). The “Size on disk” identical to the original, but the “Size” is smaller than the original Size. Meaning it’s taking up the same disk allocation, but some bits got lost along the way. The funny thing is that if I then split the merged copy and merge it again into another copy, then this third copy is identical to the second. So original is still the odd one out.

I can’t seem to find the reason for this other than I’m missing something really obvious or this is a platform issue. I hope it’s the former because cursing at the latter feels… weird.

Here’s the “Slice” class where data would be stored and sent/received async.

public class Slice
{
	// Slice Id (Checksum / Currently not used)
	public string Id { get; set; }
	
	// File(Blob) Id (Checksum)
	public string SourceId { get; set; }

	// Blob location index
	public int Index { get; set; }

	// Slice byte length
	public int Size { get; set; }

	// Slice data
	public string Data { get; set; }

	public bool Complete { get; set; }

	public Slice()
	{
		Complete = false;
	}
}

And the “Blob” class that use the above slice(s)

public class Blob
{
	// File Id (Checksum)
	public string Id { get; set; }

	// Slice collection
	public SortedDictionary<int, Slice> Slices { get; set; }

	// Save path
	public string Path { get; set; }

	// File size
	public int Size { get; set; }

	// Assembled file size
	public int CompletedSize { get; set; }

	public Blob()
	{
		Slices = new SortedDictionary<int, Slice>();
		Size = 0;
		CompletedSize = 0;
	}
}

And of course, the uglier-than-sin FileUtils class (those with weak hearts, avert your eyes).

public static class FileUtils
{
	private static int _blockSize = 65536;

	public static void SplitSlices(ref Blob blob)
	{
		FileInfo info = new FileInfo(blob.Path);
		string source = info.FullName;
		string dir = info.DirectoryName;

		using (FileStream fs = new FileStream(source, FileMode.Open, FileAccess.Read))
		{
			foreach (KeyValuePair<int, Slice> kv in blob.Slices)
			{
				Slice slice = kv.Value;
				byte[] data = new byte[slice.Size];
				int read = 0;

				fs.Seek(slice.Index, SeekOrigin.Begin);
				if ((read = fs.Read(data, 0, slice.Size)) > 0)
				{
					WriteSlice(ref slice, data, dir);
				}
			}
		}
	}

	public static void WriteSlice(ref Slice slice, byte[] data, string dir)
	{
		string slicePath = SourceFromSlice(slice, dir);
		using (FileStream ofs =
			new FileStream(slicePath, FileMode.OpenOrCreate, FileAccess.ReadWrite))
		{
			ofs.Write(data, 0, slice.Size);
			slice.Complete = true;
		}
	}

	public static void MergeSlices(ref Blob blob)
	{
		FileInfo blobInfo = new FileInfo(blob.Path);
		string dir = blobInfo.DirectoryName;

		using (FileStream outfs =
			new FileStream(blobInfo.FullName, FileMode.OpenOrCreate, FileAccess.ReadWrite))
		{
			foreach (KeyValuePair<int, Slice> kv in blob.Slices)
			{
				Slice slice = kv.Value;
				if (slice.Complete)
				{
					byte[] bytes = ReadSlice(ref slice, dir, true);
					outfs.Seek(slice.Index, SeekOrigin.Begin);
					outfs.Write(bytes, 0, slice.Size);

					// Update the completed count
					blob.CompletedSize += slice.Size;
				}
			}
		}
	}

	public static byte[] ReadSlice(ref Slice slice, string dir, bool delAfterReading)
	{
		int read = 0;
		byte[] data = new byte[slice.Size];
		string slicePath = SourceFromSlice(slice, dir);

		using (FileStream ifs = new FileStream(slicePath, FileMode.Open, FileAccess.Read))
		{
			read = ifs.Read(data, 0, slice.Size);
		}

		if (delAfterReading)
			File.Delete(slicePath);

		return data;
	}

	public static void InitBlob(ref Blob blob)
	{
		int sliceCount = 0;
		int sliceSize;

		// Catch remaining byte length after splitting
		int remainder = (blob.Size > _blockSize)? (blob.Size % _blockSize) : 0;

		// If this is a big file that can be split...
		if (blob.Size > _blockSize)
		{
			sliceCount = blob.Size / _blockSize;
			sliceSize = blob.Size / sliceCount;
		}
		else // Slice size same as blob size and only one slice needed
		{
			sliceCount = 1;
			sliceSize = blob.Size;
		}

		for (int i = 0; i < sliceCount; i++)
		{
			Slice slice = new Slice();
			slice.SourceId = blob.Id;
			slice.Size = (i == 0) ? sliceSize + remainder : sliceSize;
			slice.Index = i * slice.Size;

			blob.Slices.Add(slice.Index, slice);
		}
	}

	public static Blob GetBlob(string source)
	{
		Blob blob = new Blob();
		FileInfo info = new FileInfo(source);

		blob.Id = FileId(source);
		blob.Size = LengthToInt(info.Length);
		blob.Path = info.FullName;
		blob.CompletedSize = LengthToInt(info.Length);

		InitBlob(ref blob);
		return blob;
	}

	public static string GetChecksum(string source, string mode = "md5", bool isFile = false)
	{
		byte[] bytes = { };
		Stream fs;

		if (isFile)
			fs = new BufferedStream(File.OpenRead(source), 120000);
		else
			fs = new MemoryStream(Encoding.UTF8.GetBytes(source));

		switch (mode.ToLower())
		{
			case "sha1":
				using (SHA1CryptoServiceProvider sha1 = new SHA1CryptoServiceProvider())
					bytes = sha1.ComputeHash(fs);
				break;

			case "sha256":
				using (SHA256CryptoServiceProvider sha256 = new SHA256CryptoServiceProvider())
					bytes = sha256.ComputeHash(fs);
				break;

			case "sha512":
				using (SHA512CryptoServiceProvider sha512 = new SHA512CryptoServiceProvider())
					bytes = sha512.ComputeHash(fs);
				break;

			case "md5":
			default:
				using (MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider())
					bytes = md5.ComputeHash(fs);
				break;
		}

		// Cleanup
		fs.Close();
		fs = null;

		return BitConverter
			.ToString(bytes)
			.Replace("-", "")
			.ToLower();
	}

	private static int LengthToInt(long length)
	{
		return (int)Math.Ceiling((double)length);
	}

	private static string FileId(string source)
	{
		return GetChecksum(new FileInfo(source).FullName, "sha256", true);
	}

	private static string SourceFromSlice(Slice slice, string dir)
	{
		return dir + "\\" + slice.SourceId + "_" + slice.Index + ".slice";
	}
}
Advertisements

I just found the oldest file on my computer

Named “DFIX.EXE” and last modified December ‎10, ‎1986, ‏‎1:44AM, I can’t remember what it is or what it belonged to originally, but it was probably from one of those DOS apps and games that got passed around while I was in school. Uh, yeah, this was before “filesharing” was taboo, anti-virus was hardly common and Napster didn’t exist.  The “Created date” is listed as 2001, which means it had a rather interesting journey.

This was probably copied from a much older floppy to my old Packard Bell PC (the brand no longer sells in the U.S.) that I got in the mid 90’s with the then state of the art Windows 95 as I was just entering high school. I used this PC for most of my school work, games, MP3s and surfing until graduation. It was, cue massive nostalgia, the same PC that I used to setup a small webserver and start a little community hub called Ghostnetworks.

The PC went through a Win 98 upgrade, at which point the file was probably copied over along with my entire (“gasp”) 2GB worth of “stuff” from the PCBells’ 4GB drive into my Dell Inspiron 8200 laptop with its stunning 60GB capacity. This was so I can use the PC as a dedicated server running Apache for GN. The inspiron served me well (it still works!), but was starting to lag behind on my work and it was pretty heavy at almost 8Lbs.

I didn’t carry the laptop around all that much and this file, among countless others, was long forgotten in a “backup” folder in My Documents. I then got an eMachines box (can’t remember the exact specs) in the early 2000’s on sale at our local Staples and it received the entire hard drive contents of the Dell (about 20GB worth of files “created” on the new PC) and this file too sat there for many moons.

I went through all of 2004 until I cracked open the old eMachines again and Lo and behold! Old stuff!!

This, along with the rest of the super massive 60GB that had accumulated on the eMachines ended up on a second custom PC and it’s 500GB drive that I got from a reseller after the DotCom bust. And there it sat again until the drive was moved to yet another custom PC and then another, until finally, after the drive started making the proverbial death clicks, it was moved to its final (maybe) resting place on a new-ish 1TB WD drive after almost 14 years. My, how far it’s travelled.

All this just goes to show how far our digital legacy has come as I’m sure for some people there are older, “proper”, documents and files that have survived to this day.

Of course, this is far from the oldest file out there. On the Internet at least, I came across a pretty old W3C document (RTF) that discusses a proto-WWW by Tim Berners-Lee. The last modified date of this document is in August of 1990.

Anyone else come across a piece of digital nostalgia?