Propose a way to store BigDecimal in SQL tables #19
Replies: 6 comments 2 replies
-
Being able to |
Beta Was this translation helpful? Give feedback.
-
With suggested approach , binary-based ORDER BY works perfectly. |
Beta Was this translation helpful? Give feedback.
-
Hi! Hmm, turning a floating-point number into a sortable representation sure is an interesting problem. Not so easy! I'm presuming the intent here is to be able to transform the BigInteger into a representation such that the result of performing the ORDER BY command on it be the same as if ordered by the usual less than (<) or greater than (>) relation. I tried my hand at this, and I came up with an approach, which I will detail in a separate comment following this one, but it has some caveats and tradeoffs, so it's likely your approach is still superior. I wanted to test your approach, but I had a few questions/clarifications about implementing it. My questions below are highlighted in bold. First things first, in your numbers above, for example
The comma (,) character here indicates the decimal place (NumberFormatInfo.NumberDecimalSeparator in your current culture), right? I didn't catch on to that at first, but after trying to follow your example, I conclude this is probably the case. I just wanted to confirm that ambiguity. Can you confirm? Moving on...
1). When you say sign here, you me sign of the number, right? Not sign of the exponent? (Because the exponent can be negative, indicating a shift of the decimal point to the left, while the number is still a positive value, e.g. 0.00001). 2). I'm not sure what you mean by biased exponent here. I mean, I know that a exponent bias is a value usually added to the exponent to make it non-negative. But part of my confusion is that the exponent is a 32 bit signed integer (4 bytes), not 2 bytes that you are calling for here. What are we adding (or subtracting or multiplying) to the exponent here, and what are we doing with the other 2 bytes? I guess we are retaining only the 2 least significant bytes?
- 14 bytes (112 bits) holds ≈ 34 digits, so this must be the "34 digits for whole part" that you mention just below the example?
- 13 byes (104 bits) holds ≈ 32 digits, so this must be the "32 digits for fractional" part that you mention just below the example? Finally, Are we wanting a helper method that puts a BigDecimal instance into this format for easy insertion in an SQL table? I recently seen on another project that it's possible to have a community discussion board for a project. That would probably a good place for information and discussions like this to live. |
Beta Was this translation helpful? Give feedback.
-
Notice: I converted this issue into a discussion. I'm thinking discussions will be better suited for feature requests, questions and discussions than having people create an "issue" to do that. Using issues for these uses always seemed hacky. Issues should be actionable. Issues are like a ticketing system. When one is created, an action needs to take place--like a code change or bug fix--and once completed to satisfaction, the issue can be marked don't. I'm not aware of Microsoft or anybody taking metrics on how long issues stay open for, or if that information is even available to query via the API or whatnot, but it's a valid metric for judging the health of a project, and it's conceivable that average time a ticket is open could be a metric that starts showing up under the Insights tab in the future. |
Beta Was this translation helpful? Give feedback.
-
🔗 IntroSo I took a crack at this. Lets define exactly what the goal is and some (ideal) requirements (please let me know if what you had envisioned is something other than what I've laid out here). 🔗 Goal (high-level description)We want a representation of a BigDecimal instance that is of a type that can be put SQL and that can be used by the ORDER BY command. In my mind, an ideal solution would have the following properties/features, ordered roughly from most important to least:
An aside as to why
There is two parts to this.
Why you cant just simply order by the exponent and then the mantissa or visa versa.This is not too hard to see. Consider these two values that are very close to each other in value: 100000 In the above example, if these numbers were parsed into BigDecimals... The first number's mantissa would be 1 and its exponent would be 5 So despite being only 1 away from each other, their mantissas and exponent are quite different values. Why a monotonically increasing (or decreasing function) across the entire input range might be hard to find.The range of possible values of a BigDecimal instance can be divided into a number of discrete domains. Lets list them in natural order, from smallest to largest: The goal here is to find or make a single function that can take a number from any of these ranges, and returns a value that who's natural ordering (by <, say) is isomorphic to the reals. Said differently: We can think of our function we seek as mapping input values to output values. The order of each element relative to every other element needs to remain the same after mapping, as it was before mapping.
In particular, most functions' output will behave differently when the input crosses over the zero and into the negative numbers. Often, the sign, magnitude and value being approached (increasing or decreasing) will be symmetric about the axis, meaning if the outputs were increasing as you approached the zero from the positive side, they will be decreasing as you proceeded from the zero into the negative. Some functions, like logarithms and roots, will start returning complex numbers when the input becomes negative (mathematically speaking, anyways. In C# code, Math.Log10 will return NaN for all negative input values 🔗 TestsTest Values💡 Try this out yourself. BigDecimal[] testValues = new BigDecimal[]
{
BigDecimal.Parse("-1000000000"),
BigDecimal.Parse("-100"),
BigDecimal.Parse("-1"),
BigDecimal.Parse("-0.1"),
BigDecimal.Parse("-0.001"),
BigDecimal.Parse("-0.0000000001"),
BigDecimal.Parse("0"),
BigDecimal.Parse("0.0000000001"),
BigDecimal.Parse("0.001"),
BigDecimal.Parse("0.1"),
BigDecimal.Parse("1"),
BigDecimal.Parse("100"),
BigDecimal.Parse("1000000000"),
}; As you can see, these test values have a value in every domain mentioned above in the aside, and near the boundaries of those domains. Those domains once again are the following ranges: Test CodeCode (click to expand)
public static void MapAndPrint<IN_TYPE, OUT_TYPE>(Func<IN_TYPE, OUT_TYPE> mapFunction, Func<OUT_TYPE, string> printFunction, IEnumerable<IN_TYPE> values)
{
List<OUT_TYPE> results = ApplyMapFunction(mapFunction, values);
PrintValues(printFunction, results);
}
public static void PrintValues<T>(Func<T, string> printFunction, IEnumerable<T> values)
{
foreach (T item in values)
{
Console.WriteLine(printFunction(item));
}
Console.WriteLine();
Console.WriteLine();
Console.WriteLine();
}
public static List<OUT_TYPE> ApplyMapFunction<IN_TYPE, OUT_TYPE>(Func<IN_TYPE, OUT_TYPE> mapFunction, IEnumerable<IN_TYPE> values)
{
List<OUT_TYPE> results = new List<OUT_TYPE>();
foreach (IN_TYPE value in values)
{
results.Add(mapFunction(value));
}
return results;
} Example mapping codeCode (click to expand)
public static BigDecimal ExampleMappingFunction(BigDecimal input)
{
BigDecimal result = BigDecimal.Normalize(input);
// Do... something
return result;
}
public static string PrintBigDecimal(BigDecimal input)
{
return input.ToString();
} Example usage:Code (click to expand)
void Main()
{
// Example usage
MapAndPrint(ExampleMappingFunction, PrintBigDecimal, testValues);
} Example outputOutput (click to expand)
The output is just the input, unchanged. This is because the example mapping function just passes the BigDecimal through unchanged. Make it do something different. Tip Your mapping function does not have to return another BigDecimal. It can return a double or a string or whatever. If you want to see an working example that actually does something, check out my found solutions in the next section, "What I came up with". 🔗What I came up withBelow is my mapping and print function (to work with the test code methods in the previous section) public static Tuple<int, double> ConvertToSortableTuple(BigDecimal input)
{
BigDecimal value = BigDecimal.Normalize(input);
double result = BigInteger.Log10(BigInteger.Abs(value.Mantissa));
result += (double)value.Exponent;
if (value.Sign == 0)
{
result = 0;
}
else if (value.Sign < 0)
{
result *= value.Sign;
}
return new Tuple<int, double>( value.Sign, result);
}
public static string PrintTuple<T,U>(Tuple<T, U> input)
{
return $"{input.Item1}|{input.Item2}";
} Example usage: Code (click to expand)
MapAndPrint(ConvertToSortableTuple, PrintTuple, testValues); Example output:
So as you may notice, I am not outputting a single value; Rather I am outputting two values--a 2-Tuple. Note So this does not satisfy requirement 3. But it does satisfy requirement 1 and it does satisfy requirement 2. Sort of. There is loss of precision. Considering that the mapping function converts the BigDecimal into a double value, when restoring a BigDecimal from the stored value, the result can be no more precise than a double! I discuss a way to work-around this problem in the next section. I was able to work out a scheme that output only a single value, satisfying requirement 3. However, that output does not satisfy requirement 2! Here is the code that successfully turns a BigDecimal into a single orderable value: Code (click to expand)
public static double ConvertToSortableDouble_ATan2(BigDecimal input)
{
BigDecimal value = BigDecimal.Normalize(input);
double log10 = BigInteger.Log10(BigInteger.Abs(value.Mantissa));
log10 += (double)value.Exponent;
if (value.Sign == 0)
{
log10 = 0;
}
else
{
log10 = Math.Atan2(value.Sign, -log10);
}
return log10;
}
public static string PrintDouble(double input)
{
return input.ToString("##########0.##########");
} And its output: Output(click to expand)
Unless you're are a trigonometry buff, you probably are not familiar with the domain of Atan2: As the input value approaches infinity, the resulting value approaches Pi! And as you can see, an input of 1 billion already gets us to a value pretty close to Pi; 3.0309354324. This means that for all larger values greater than 1 billion, all the way up to infinity, will be crammed into the remaining 0.110657221189. So as you see, there is a lot of loss of precision for larger values. While it is technically possible to reverse this operation, it sort of breaks down for values with more than a few decimals to the left of the decimal point. 🔗 Workarounds, Compromises, SolutionsSo what can be done? If the intent is to store BigDecimal values in the rows of an SQL Table, it is still possible to be able to store a BigDecimal in a row with arbitrary precision and still be able to order by the value stored there: Use multiple columns. You can 1 or 2 columns for ordering by, a column for the exponent and a column that is a byte array (BINARY) for storing the mantissa. In order to turn the mantissa (a BigInteger) into a byte array, call BigInteger's instance method ToByteArray and then Reverse the bite array: BigInteger bi = BigInteger.Parse("70001");
byte[] byteArray = bi.ToByteArray().Reverse().ToArray(); Sure, it might not be as elegant as one would like it to be, but it's not the end of the world. Output(click to expand)
public static Tuple<bool, double, int, byte[]> ConvertToSortableTuple(BigDecimal input)
{
BigDecimal value = BigDecimal.Normalize(input);
double sortValue = BigInteger.Log10(BigInteger.Abs(value.Mantissa));
sortValue += (double)value.Exponent;
if (value.Sign == 0)
{
sortValue = 0;
}
else if (value.Sign < 0)
{
sortValue *= value.Sign;
}
return new Tuple<bool, double, int, byte[]>(value.Sign == -1, sortValue, value.Exponent, value.Mantissa.ToByteArray().Reverse().ToArray());
} |
Beta Was this translation helpful? Give feedback.
-
I was working on my SQL query executor engine PoC. It parses TSQL statements, generates execution plan and executes it.
private const int conversionBytesForSqlBlob = 32;
private const int conversionBytesForMainPart = conversionBytesForSqlBlob / 2;
private const int conversionBytesForExponent = 2;
private const int conversionBytesForInteger = conversionBytesForMainPart - conversionBytesForExponent;
private const int conversionBytesForFractionPart = conversionBytesForSqlBlob - conversionBytesForMainPart - 2;
private const int conversionBytesForLeadingZeroes = 1;
private const int conversionBytesForFraction = conversionBytesForFractionPart - conversionBytesForLeadingZeroes;
private const int conversionBitsForExponent = conversionBytesForExponent * 8 - 1;
private const int conversionBiasedMaxValue = 32768; //(int)Math.Pow(2, conversionBitsForExponent);
private const int conversionBiasedMidPoint = conversionBiasedMaxValue / 2;
private const int conversionMaxExponent = conversionBiasedMaxValue - conversionBiasedMidPoint;
private const int conversionMinExponent = -conversionBiasedMidPoint;
private const int conversionMaxDigitsInteger = 34;// (int)(conversionBytesForInteger * 8 * Math.Log10(2)) + 1;
private const int conversionMaxDigitsFraction = 32;// (int)(conversionBytesForFraction * 8 * Math.Log10(2)) + 1;
public static BigDecimal FromSqlBlob(byte[] bytes)
{
if (bytes == null)
{
throw new ArgumentNullException("bytes");
}
if (bytes.Length != conversionBytesForSqlBlob)
{
throw new NotSupportedException($"bytes length doesnt match with ConversionBytesForSqlBlob = '{conversionBytesForSqlBlob}'");
}
var isPositive = (bytes[0] & (1 << 7)) != 0;
var exponentBytes = new byte[conversionBytesForExponent];
Array.Copy(bytes, exponentBytes, conversionBytesForExponent);
exponentBytes[0] = (byte)(exponentBytes[0] & ~(byte)(1 << 7));
var intBytes = new byte[4];
Array.Copy(exponentBytes.Reverse().ToArray(), intBytes, conversionBytesForExponent);
var exponent = BitConverter.ToInt32(intBytes) - conversionBiasedMidPoint;
var integerBytesQty = bytes[bytes.Length - 2];
var integerBytes = new byte[integerBytesQty];
Array.Copy(bytes, conversionBytesForExponent + conversionBytesForInteger - integerBytesQty, integerBytes, 0, integerBytesQty);
if (!isPositive)
{
for (var idx = integerBytesQty - 1; idx >= 0; idx--)
{
integerBytes[idx] = (byte)(~integerBytes[idx]);
}
}
var integer = new BigInteger(integerBytes, false, false);
var bitsForLeadingZeros = conversionBytesForLeadingZeroes * 8;
var leadingZerosMaxValue = (int)Math.Pow(2, bitsForLeadingZeros) - 1;
var leadingZeroesBytes = new byte[conversionBytesForLeadingZeroes];
Array.Copy(bytes, conversionBytesForExponent + conversionBytesForInteger, leadingZeroesBytes, 0, conversionBytesForLeadingZeroes);
intBytes = new byte[4];
Array.Copy(leadingZeroesBytes.Reverse().ToArray(), intBytes, conversionBytesForLeadingZeroes);
var biasedLeadingZeroes = BitConverter.ToInt32(intBytes);
var leadingZeroes = isPositive ? leadingZerosMaxValue - biasedLeadingZeroes : BitConverter.ToInt32(intBytes);
var fractionBytesQty = bytes[bytes.Length - 1];
var fractionBytes = new byte[fractionBytesQty];
var fractionStartIndex = conversionBytesForExponent + conversionBytesForInteger + conversionBytesForLeadingZeroes;
Array.Copy(bytes, fractionStartIndex, fractionBytes, 0, fractionBytesQty);
if (!isPositive)
{
for (var idx = fractionBytesQty - 1; idx >= 0; idx--)
{
fractionBytes[idx] = (byte)(~fractionBytes[idx]);
}
}
var fraction = new BigInteger(fractionBytes, false, true);
var mantissa = default(BigInteger);
if (fraction == 0)
{
mantissa = integer;
}
else
{
var fractionDigits = fraction.CountDigits();
if (exponent == 0 && !integer.IsZero)
{
exponent = -(leadingZeroes + fractionDigits);
}
mantissa = integer * BigInteger.Pow(10, leadingZeroes + fractionDigits) + fraction;
}
if (isPositive)
{
return new BigDecimal(mantissa, exponent);
}
else
{
return new BigDecimal(-mantissa, exponent);
}
}
public byte[] ToSqlBlob()
{
var bigDecimal = new BigDecimal(this.Mantissa, this.Exponent);
if (bigDecimal.Exponent > conversionMaxExponent)
{
bigDecimal = bigDecimal.AlignExponent(conversionMaxExponent);
}
if (bigDecimal.Exponent < conversionMinExponent)
{
throw new NotSupportedException();
}
var exponent = bigDecimal.Exponent;
var integerPart = default(BigInteger);
var fractionPart = default(BigInteger);
if (bigDecimal.Exponent >= 0)
{
if (bigDecimal.Sign == -1)
{
var mantissaBytes = bigDecimal.Mantissa.ToByteArray(false, true);
var mantissaBits = new BitArray(mantissaBytes);
var bytesQtyToSkip = 0;
mantissaBits = mantissaBits.Not();
mantissaBits.CopyTo(mantissaBytes, 0);
if (mantissaBytes[0] == 0)
{
bytesQtyToSkip = 1;
}
integerPart = new BigInteger(new ReadOnlySpan<byte>(mantissaBytes, bytesQtyToSkip, mantissaBytes.Length - bytesQtyToSkip), true, true) + 1;
}
else
{
integerPart = bigDecimal.Mantissa;
}
fractionPart = BigInteger.Zero;
}
else
{
var integerPartDigits = bigDecimal.Length >= 0 ? bigDecimal.Length : 0;
var fractionPartDigits = bigDecimal.SignifigantDigits - integerPartDigits;
if (bigDecimal.Sign == -1)
{
var mantissaBytes = bigDecimal.Mantissa.ToByteArray(false, true);
var mantissaBits = new BitArray(mantissaBytes);
var bytesQtyToSkip = 0;
mantissaBits = mantissaBits.Not();
mantissaBits.CopyTo(mantissaBytes, 0);
if (mantissaBytes[0] == 0)
{
bytesQtyToSkip = 1;
}
var restoredMantissa = new BigInteger(new ReadOnlySpan<byte>(mantissaBytes, bytesQtyToSkip, mantissaBytes.Length - bytesQtyToSkip), true, true);
integerPart = restoredMantissa.FirstDigits(integerPartDigits);
fractionPart = restoredMantissa.LastDigits(fractionPartDigits) + 1;
}
else
{
integerPart = bigDecimal.Mantissa.FirstDigits(integerPartDigits);
fractionPart = bigDecimal.Mantissa.LastDigits(fractionPartDigits);
}
}
if (integerPart.CountDigits() > conversionMaxDigitsInteger)
{
throw new NotSupportedException();
}
if (fractionPart.CountDigits() > conversionMaxDigitsFraction)
{
throw new NotSupportedException();
}
var resultBytes = new byte[conversionBytesForSqlBlob];
if (bigDecimal.Length > 0 && exponent < 0)
{
exponent = 0;
}
var biasedExponent = exponent + conversionBiasedMidPoint;
var biasedExponentBytes = BitConverter.GetBytes(biasedExponent).Take(conversionBytesForExponent).Reverse().ToArray();
if (bigDecimal.Sign >= 0)
{
biasedExponentBytes[0] |= (byte)(1 << 7);
}
biasedExponentBytes.CopyTo(resultBytes, 0);
if (integerPart != BigInteger.Zero)
{
var integerPartBytes = integerPart.ToByteArray(false, false);
integerPartBytes.CopyTo(resultBytes, conversionBytesForExponent + conversionBytesForInteger - integerPartBytes.Length);
resultBytes[resultBytes.Length - 2] = (byte)integerPartBytes.Length;
}
if (fractionPart != BigInteger.Zero)
{
var leadingZeroes = bigDecimal.DecimalPlaces - fractionPart.CountDigits();
var bitsForLeadingZeros = conversionBytesForLeadingZeroes * 8;
var leadingZerosMaxValue = (int)Math.Pow(2, bitsForLeadingZeros) - 1;
if (leadingZeroes > leadingZerosMaxValue)
{
throw new NotSupportedException();
}
//OK
var biasedLeadingZeroes = bigDecimal.Sign == -1 ? leadingZeroes : leadingZerosMaxValue - leadingZeroes;
BitConverter
.GetBytes(biasedLeadingZeroes)
.Take(conversionBytesForLeadingZeroes)
.Reverse()
.ToArray()
.CopyTo(resultBytes, conversionBytesForExponent + conversionBytesForInteger);
var fractionPartBytes = fractionPart.ToByteArray(false, true);
fractionPartBytes.CopyTo(resultBytes, conversionBytesForExponent + conversionBytesForInteger + conversionBytesForLeadingZeroes);
resultBytes[resultBytes.Length - 1] = (byte)fractionPartBytes.Length;
}
if (bigDecimal.Sign == -1)
{
//INFO: Invert bits of Integer/Fraction parts if number is negative
var startIndexIntegerPart = conversionBytesForExponent;
var endIndexIntegerPart = conversionBytesForExponent + conversionBytesForInteger - 1;
var startIndexFractionPart = conversionBytesForExponent + conversionBytesForInteger + conversionBytesForLeadingZeroes;
var endIndexFractionPart = conversionBytesForExponent + conversionBytesForInteger + conversionBytesForLeadingZeroes + conversionBytesForFraction - 1;
for (var idx = resultBytes.Length - 1; idx >= 0; idx--)
{
if (
(idx >= startIndexIntegerPart && idx <= endIndexIntegerPart ) ||
(idx >= startIndexFractionPart && idx <= endIndexFractionPart )
)
{
resultBytes[idx] = (byte)(~resultBytes[idx]);
}
}
}
return resultBytes;
} |
Beta Was this translation helpful? Give feedback.
-
Frequently it is required to store BigDecimal in database with ability to use it in matching and ORDER BY clauses. I've implemented next approach in my project:
I split BigDecimal into Whole, Fractional parts with total size 32 bytes:
example:
11,123 -> 1100000000000000 ...00000001011 11111111 01111011000... 00000001 00000001
11,125 -> 1100000000000000 ...00000001011 11111111 01111101000... 00000001 00000001
12,123 -> 1100000000000000 ...00000001100 11111111 01111011000... 00000001 00000001
12,125 -> 1100000000000000 ...00000001100 11111111 01111101000... 00000001 00000001
-11,123 -> 0100000000000000 ...11111110100 00000000 10000100111... 00000001 00000001
-11,125 -> 0100000000000000 ...11111110100 00000000 10000010111... 00000001 00000001
-12,123 -> 0100000000000000 ...11111110011 00000000 10000100111... 00000001 00000001
-12,125 -> 0100000000000000 ...11111110011 00000000 10000010111... 00000001 00000001
This approach allows to store BigDecimal with 34 digits for whole part and 32 digits for fractional.
int conversionBytesForSqlBlob = 32;
int conversionBytesForMainPart = conversionBytesForSqlBlob / 2;
int conversionBytesForExponent = 2;
int conversionBytesForInteger = conversionBytesForMainPart - conversionBytesForExponent;
int conversionBytesForFractionPart = conversionBytesForSqlBlob - conversionBytesForMainPart - 2;
int conversionBytesForLeadingZeroes = 1;
int conversionBytesForFraction = conversionBytesForFractionPart - conversionBytesForLeadingZeroes;
int conversionBitsForExponent = conversionBytesForExponent * 8 - 1;
int conversionBiasedMaxValue = 32768; //(int)Math.Pow(2, conversionBitsForExponent);
int conversionBiasedMidPoint = conversionBiasedMaxValue / 2;
int conversionMaxExponent = conversionBiasedMaxValue - conversionBiasedMidPoint;
int conversionMinExponent = -conversionBiasedMidPoint;
int conversionMaxDigitsInteger = 34;// (int)(conversionBytesForInteger * 8 * Math.Log10(2)) + 1;
int conversionMaxDigitsFraction = 32;// (int)(conversionBytesForFraction * 8 * Math.Log10(2)) + 1;
Beta Was this translation helpful? Give feedback.
All reactions