What is a schema?

In programming languages usually the concept of schemas and the representation of the following data is not separated. But what is a schema exactly?

The way I interpret schemas is the concept of logical data structure without actual data representation/realization.

Meaning that

struct foo {
  int32_t a;
  float b;
};

does two things. For one it describes that the struct foo contains two members a and b of the types int32_t and float. But at the same time it already is decided how the data layout looks like. It's impossible to represent it in two layouts. It also is very unhandy to write efficient reflective algorithms for these types.

A solution for this is to use templating with empty structs to describe a layout. Like so.

template<typename T, uint64_t N>
struct Primitive {};

struct SignedInteger{};
struct FloatingPoint{};

using Int32 = Primitive<SignedInteger, 4>;
using Float32 = Primitive<FloatingPoint,4>;

template<typename V, string_literal K>
struct Member {};

template<typename... Members>
struct Struct {};

using Foo = Struct<
 Member<Int32, "a">,
 Member<Float32, "b">
>;

Here we have no storage at all. To separate different data layouts we introduce encoding struct guides as well.

struct Native {};
struct CustomFastEncoding {};

Then we introduce the actual storage of those

template<typename Schema, typename Encoding>
class data;

template<typename... Values, string_literal... Keys>
class data<Struct<Member<Values, Keys>..., Native> {
private:
 std::tuple<data<Values,Native>...> values_;
public:
 // deduce type with some template magic
 template<string_literal K>
 auto& get();
};

template<typename T, uint64_t N>
class data<Primitive<T,N>, Native> {
private:
 typename native_data_type<Primitive<T,N>>::type value_;
public:
 // Add getters and setters etc.
};

template<typename T, uint64_t N>
class data<Primitive<T,N>, CustomFastEncoding> {
private:
 std::shared_ptr<std::vector<uint8_t>> data_;
 uint64_t shift_;
public:

};

I know that the C++ abstract machine basically stops me using the CustomFastEncoding since C++ requires any type I write or read to be the type it has been initialized as, but as of now I'm waiting for the relevant standard proposal to go through to be able to use this kind of approach safely. And for now the compilers happily manipulate void data without a hitch.

Published on Sun 04 August 2024
Author: Claudius "keldu" Holeksa
IPv6 Certification Badge for keldu