How you debug a binary format












4















I would like to be able to debug building a binary builder. Right now I am basically printing out the input data to the binary parser, and then going deep into the code and printing out the mapping of the input to the output, then taking the output mapping (integers) and using that to locate the corresponding integer in the binary. Pretty clunky, and requires that I modify the source code deeply to get at the mapping between input and output.



It What seems like you could do is view the binary in different variants (in my case I'd like to view it in 8-bit chunks as decimal numbers, because that's pretty close to the input). Actually, some numbers are 16 bit, some 8, some 32, etc. So maybe there would be a way to view the binary with each of these different numbers highlighted in memory in some way.



The only way I could see that being possible is if you actually build a visualizer specific to the actual binary format/layout. So it knows where in the sequence the 32 bit numbers should be, and where the 8 bit numbers should be, etc. This is a lot of work and kind of tricky in some situations. So wondering if there's a general way to do it.



Also wondering what the general way of debugging this type of thing currently is, so maybe I can get some ideas on what to try from that.










share|improve this question


















  • 7





    You got one answer saying "use the hexdump directly, and do this and that additionally"- and that answer got a lot of upvotes. And a second answer, 5 hours later(!), saying only "use a hexdump". Then you accepted the second one in favor of the first? Seriously?

    – Doc Brown
    5 hours ago













  • While you might have a good reason to use a binary format, do consider whether you can just use an existing text format like JSON instead. Human readability counts a lot, and machines and networks are typically fast enough that using a custom format to reduce size is unnecessary nowadays.

    – jpmc26
    4 hours ago








  • 1





    The tool you want is a hex editor. If you're using Windows, HxD is pretty good.

    – immibis
    2 hours ago
















4















I would like to be able to debug building a binary builder. Right now I am basically printing out the input data to the binary parser, and then going deep into the code and printing out the mapping of the input to the output, then taking the output mapping (integers) and using that to locate the corresponding integer in the binary. Pretty clunky, and requires that I modify the source code deeply to get at the mapping between input and output.



It What seems like you could do is view the binary in different variants (in my case I'd like to view it in 8-bit chunks as decimal numbers, because that's pretty close to the input). Actually, some numbers are 16 bit, some 8, some 32, etc. So maybe there would be a way to view the binary with each of these different numbers highlighted in memory in some way.



The only way I could see that being possible is if you actually build a visualizer specific to the actual binary format/layout. So it knows where in the sequence the 32 bit numbers should be, and where the 8 bit numbers should be, etc. This is a lot of work and kind of tricky in some situations. So wondering if there's a general way to do it.



Also wondering what the general way of debugging this type of thing currently is, so maybe I can get some ideas on what to try from that.










share|improve this question


















  • 7





    You got one answer saying "use the hexdump directly, and do this and that additionally"- and that answer got a lot of upvotes. And a second answer, 5 hours later(!), saying only "use a hexdump". Then you accepted the second one in favor of the first? Seriously?

    – Doc Brown
    5 hours ago













  • While you might have a good reason to use a binary format, do consider whether you can just use an existing text format like JSON instead. Human readability counts a lot, and machines and networks are typically fast enough that using a custom format to reduce size is unnecessary nowadays.

    – jpmc26
    4 hours ago








  • 1





    The tool you want is a hex editor. If you're using Windows, HxD is pretty good.

    – immibis
    2 hours ago














4












4








4


1






I would like to be able to debug building a binary builder. Right now I am basically printing out the input data to the binary parser, and then going deep into the code and printing out the mapping of the input to the output, then taking the output mapping (integers) and using that to locate the corresponding integer in the binary. Pretty clunky, and requires that I modify the source code deeply to get at the mapping between input and output.



It What seems like you could do is view the binary in different variants (in my case I'd like to view it in 8-bit chunks as decimal numbers, because that's pretty close to the input). Actually, some numbers are 16 bit, some 8, some 32, etc. So maybe there would be a way to view the binary with each of these different numbers highlighted in memory in some way.



The only way I could see that being possible is if you actually build a visualizer specific to the actual binary format/layout. So it knows where in the sequence the 32 bit numbers should be, and where the 8 bit numbers should be, etc. This is a lot of work and kind of tricky in some situations. So wondering if there's a general way to do it.



Also wondering what the general way of debugging this type of thing currently is, so maybe I can get some ideas on what to try from that.










share|improve this question














I would like to be able to debug building a binary builder. Right now I am basically printing out the input data to the binary parser, and then going deep into the code and printing out the mapping of the input to the output, then taking the output mapping (integers) and using that to locate the corresponding integer in the binary. Pretty clunky, and requires that I modify the source code deeply to get at the mapping between input and output.



It What seems like you could do is view the binary in different variants (in my case I'd like to view it in 8-bit chunks as decimal numbers, because that's pretty close to the input). Actually, some numbers are 16 bit, some 8, some 32, etc. So maybe there would be a way to view the binary with each of these different numbers highlighted in memory in some way.



The only way I could see that being possible is if you actually build a visualizer specific to the actual binary format/layout. So it knows where in the sequence the 32 bit numbers should be, and where the 8 bit numbers should be, etc. This is a lot of work and kind of tricky in some situations. So wondering if there's a general way to do it.



Also wondering what the general way of debugging this type of thing currently is, so maybe I can get some ideas on what to try from that.







debugging binary






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 13 hours ago









Lance PollardLance Pollard

784312




784312








  • 7





    You got one answer saying "use the hexdump directly, and do this and that additionally"- and that answer got a lot of upvotes. And a second answer, 5 hours later(!), saying only "use a hexdump". Then you accepted the second one in favor of the first? Seriously?

    – Doc Brown
    5 hours ago













  • While you might have a good reason to use a binary format, do consider whether you can just use an existing text format like JSON instead. Human readability counts a lot, and machines and networks are typically fast enough that using a custom format to reduce size is unnecessary nowadays.

    – jpmc26
    4 hours ago








  • 1





    The tool you want is a hex editor. If you're using Windows, HxD is pretty good.

    – immibis
    2 hours ago














  • 7





    You got one answer saying "use the hexdump directly, and do this and that additionally"- and that answer got a lot of upvotes. And a second answer, 5 hours later(!), saying only "use a hexdump". Then you accepted the second one in favor of the first? Seriously?

    – Doc Brown
    5 hours ago













  • While you might have a good reason to use a binary format, do consider whether you can just use an existing text format like JSON instead. Human readability counts a lot, and machines and networks are typically fast enough that using a custom format to reduce size is unnecessary nowadays.

    – jpmc26
    4 hours ago








  • 1





    The tool you want is a hex editor. If you're using Windows, HxD is pretty good.

    – immibis
    2 hours ago








7




7





You got one answer saying "use the hexdump directly, and do this and that additionally"- and that answer got a lot of upvotes. And a second answer, 5 hours later(!), saying only "use a hexdump". Then you accepted the second one in favor of the first? Seriously?

– Doc Brown
5 hours ago







You got one answer saying "use the hexdump directly, and do this and that additionally"- and that answer got a lot of upvotes. And a second answer, 5 hours later(!), saying only "use a hexdump". Then you accepted the second one in favor of the first? Seriously?

– Doc Brown
5 hours ago















While you might have a good reason to use a binary format, do consider whether you can just use an existing text format like JSON instead. Human readability counts a lot, and machines and networks are typically fast enough that using a custom format to reduce size is unnecessary nowadays.

– jpmc26
4 hours ago







While you might have a good reason to use a binary format, do consider whether you can just use an existing text format like JSON instead. Human readability counts a lot, and machines and networks are typically fast enough that using a custom format to reduce size is unnecessary nowadays.

– jpmc26
4 hours ago






1




1





The tool you want is a hex editor. If you're using Windows, HxD is pretty good.

– immibis
2 hours ago





The tool you want is a hex editor. If you're using Windows, HxD is pretty good.

– immibis
2 hours ago










4 Answers
4






active

oldest

votes


















2














One possibility would be to show a hex dump and show the value of the currently selected/highlighted bytes as a number.






share|improve this answer































    20














    For ad-hoc checks, just use a standard hexdump and learn to eyeball it.



    If you want to tool up for a proper investigation, I usually write a separate decoder in something like Python - ideally this will be driven directly from a message spec document or IDL, and be as automated as possible (so there's no chance of manually introducing the same bug in both decoders).



    Lastly, don't forget you should be writing unit tests for your decoder, using known-correct canned input.






    share|improve this answer
























    • "just use a standard hexdump and learn to eyeball it." Yup. In my experience, multiple sections of anything up to 200 bits can be written down on a whiteboard for grouped comparison, which sometimes helps with this kind of thing to get started.

      – Mast
      9 hours ago



















    4














    ASN.1, Abstract Syntax Notation One, provides a way of specifying a binary format.




    • DDT - Develop using sample data and unit tests.

    • A textual dump can be helpful. If in XML you can collapse/expand subhierarchies.

    • ASN.1 is not really needed but a grammar based, more declarative file specification is easier.






    share|improve this answer



















    • 2





      If the never-ending parade of security vulnerabilities in ASN.1 parsers is any indication, adopting it would certainly provide good exercise in debugging binary formats.

      – Mark
      5 hours ago



















    3














    The first step to doing this is that you need a way to find or define a grammar that describes structure of the data i.e. a schema.



    An example of this is a language feature of COBOL which is informally known as copybook. In COBOL programs you would define the structure of the data in memory. This structure mapped directly to the way the bytes were stored. This is common to languages of that era as opposed to common contemporary languages where the physical layout of memory is an implementation concern that is abstracted away from the developer.



    A google search for binary data schema language turns up a number of tools. An example is Apache DFDL. There may already be UI for this as well.






    share|improve this answer





















    • 1





      This feature is not reserved to 'ancient' era languages. C and C++ structs and unions can be memory aligned. C# has StructLayoutAttribute, which I have use to transmit binary data.

      – Kasper van den Berg
      11 hours ago






    • 1





      @KaspervandenBerg Unless you are saying that C and C++ added these recently, I consider that the same era. The point is that these formats were not simply for data transmission, though they were used for that, they mapped directly to how the code worked with data in memory and on disk. That's not, in general, how newer languages tend to work though they may have such features.

      – JimmyJames
      11 hours ago











    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "131"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f385623%2fhow-you-debug-a-binary-format%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    One possibility would be to show a hex dump and show the value of the currently selected/highlighted bytes as a number.






    share|improve this answer




























      2














      One possibility would be to show a hex dump and show the value of the currently selected/highlighted bytes as a number.






      share|improve this answer


























        2












        2








        2







        One possibility would be to show a hex dump and show the value of the currently selected/highlighted bytes as a number.






        share|improve this answer













        One possibility would be to show a hex dump and show the value of the currently selected/highlighted bytes as a number.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 8 hours ago









        Solomon UckoSolomon Ucko

        1566




        1566

























            20














            For ad-hoc checks, just use a standard hexdump and learn to eyeball it.



            If you want to tool up for a proper investigation, I usually write a separate decoder in something like Python - ideally this will be driven directly from a message spec document or IDL, and be as automated as possible (so there's no chance of manually introducing the same bug in both decoders).



            Lastly, don't forget you should be writing unit tests for your decoder, using known-correct canned input.






            share|improve this answer
























            • "just use a standard hexdump and learn to eyeball it." Yup. In my experience, multiple sections of anything up to 200 bits can be written down on a whiteboard for grouped comparison, which sometimes helps with this kind of thing to get started.

              – Mast
              9 hours ago
















            20














            For ad-hoc checks, just use a standard hexdump and learn to eyeball it.



            If you want to tool up for a proper investigation, I usually write a separate decoder in something like Python - ideally this will be driven directly from a message spec document or IDL, and be as automated as possible (so there's no chance of manually introducing the same bug in both decoders).



            Lastly, don't forget you should be writing unit tests for your decoder, using known-correct canned input.






            share|improve this answer
























            • "just use a standard hexdump and learn to eyeball it." Yup. In my experience, multiple sections of anything up to 200 bits can be written down on a whiteboard for grouped comparison, which sometimes helps with this kind of thing to get started.

              – Mast
              9 hours ago














            20












            20








            20







            For ad-hoc checks, just use a standard hexdump and learn to eyeball it.



            If you want to tool up for a proper investigation, I usually write a separate decoder in something like Python - ideally this will be driven directly from a message spec document or IDL, and be as automated as possible (so there's no chance of manually introducing the same bug in both decoders).



            Lastly, don't forget you should be writing unit tests for your decoder, using known-correct canned input.






            share|improve this answer













            For ad-hoc checks, just use a standard hexdump and learn to eyeball it.



            If you want to tool up for a proper investigation, I usually write a separate decoder in something like Python - ideally this will be driven directly from a message spec document or IDL, and be as automated as possible (so there's no chance of manually introducing the same bug in both decoders).



            Lastly, don't forget you should be writing unit tests for your decoder, using known-correct canned input.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 12 hours ago









            UselessUseless

            8,76421736




            8,76421736













            • "just use a standard hexdump and learn to eyeball it." Yup. In my experience, multiple sections of anything up to 200 bits can be written down on a whiteboard for grouped comparison, which sometimes helps with this kind of thing to get started.

              – Mast
              9 hours ago



















            • "just use a standard hexdump and learn to eyeball it." Yup. In my experience, multiple sections of anything up to 200 bits can be written down on a whiteboard for grouped comparison, which sometimes helps with this kind of thing to get started.

              – Mast
              9 hours ago

















            "just use a standard hexdump and learn to eyeball it." Yup. In my experience, multiple sections of anything up to 200 bits can be written down on a whiteboard for grouped comparison, which sometimes helps with this kind of thing to get started.

            – Mast
            9 hours ago





            "just use a standard hexdump and learn to eyeball it." Yup. In my experience, multiple sections of anything up to 200 bits can be written down on a whiteboard for grouped comparison, which sometimes helps with this kind of thing to get started.

            – Mast
            9 hours ago











            4














            ASN.1, Abstract Syntax Notation One, provides a way of specifying a binary format.




            • DDT - Develop using sample data and unit tests.

            • A textual dump can be helpful. If in XML you can collapse/expand subhierarchies.

            • ASN.1 is not really needed but a grammar based, more declarative file specification is easier.






            share|improve this answer



















            • 2





              If the never-ending parade of security vulnerabilities in ASN.1 parsers is any indication, adopting it would certainly provide good exercise in debugging binary formats.

              – Mark
              5 hours ago
















            4














            ASN.1, Abstract Syntax Notation One, provides a way of specifying a binary format.




            • DDT - Develop using sample data and unit tests.

            • A textual dump can be helpful. If in XML you can collapse/expand subhierarchies.

            • ASN.1 is not really needed but a grammar based, more declarative file specification is easier.






            share|improve this answer



















            • 2





              If the never-ending parade of security vulnerabilities in ASN.1 parsers is any indication, adopting it would certainly provide good exercise in debugging binary formats.

              – Mark
              5 hours ago














            4












            4








            4







            ASN.1, Abstract Syntax Notation One, provides a way of specifying a binary format.




            • DDT - Develop using sample data and unit tests.

            • A textual dump can be helpful. If in XML you can collapse/expand subhierarchies.

            • ASN.1 is not really needed but a grammar based, more declarative file specification is easier.






            share|improve this answer













            ASN.1, Abstract Syntax Notation One, provides a way of specifying a binary format.




            • DDT - Develop using sample data and unit tests.

            • A textual dump can be helpful. If in XML you can collapse/expand subhierarchies.

            • ASN.1 is not really needed but a grammar based, more declarative file specification is easier.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 11 hours ago









            Joop EggenJoop Eggen

            92945




            92945








            • 2





              If the never-ending parade of security vulnerabilities in ASN.1 parsers is any indication, adopting it would certainly provide good exercise in debugging binary formats.

              – Mark
              5 hours ago














            • 2





              If the never-ending parade of security vulnerabilities in ASN.1 parsers is any indication, adopting it would certainly provide good exercise in debugging binary formats.

              – Mark
              5 hours ago








            2




            2





            If the never-ending parade of security vulnerabilities in ASN.1 parsers is any indication, adopting it would certainly provide good exercise in debugging binary formats.

            – Mark
            5 hours ago





            If the never-ending parade of security vulnerabilities in ASN.1 parsers is any indication, adopting it would certainly provide good exercise in debugging binary formats.

            – Mark
            5 hours ago











            3














            The first step to doing this is that you need a way to find or define a grammar that describes structure of the data i.e. a schema.



            An example of this is a language feature of COBOL which is informally known as copybook. In COBOL programs you would define the structure of the data in memory. This structure mapped directly to the way the bytes were stored. This is common to languages of that era as opposed to common contemporary languages where the physical layout of memory is an implementation concern that is abstracted away from the developer.



            A google search for binary data schema language turns up a number of tools. An example is Apache DFDL. There may already be UI for this as well.






            share|improve this answer





















            • 1





              This feature is not reserved to 'ancient' era languages. C and C++ structs and unions can be memory aligned. C# has StructLayoutAttribute, which I have use to transmit binary data.

              – Kasper van den Berg
              11 hours ago






            • 1





              @KaspervandenBerg Unless you are saying that C and C++ added these recently, I consider that the same era. The point is that these formats were not simply for data transmission, though they were used for that, they mapped directly to how the code worked with data in memory and on disk. That's not, in general, how newer languages tend to work though they may have such features.

              – JimmyJames
              11 hours ago
















            3














            The first step to doing this is that you need a way to find or define a grammar that describes structure of the data i.e. a schema.



            An example of this is a language feature of COBOL which is informally known as copybook. In COBOL programs you would define the structure of the data in memory. This structure mapped directly to the way the bytes were stored. This is common to languages of that era as opposed to common contemporary languages where the physical layout of memory is an implementation concern that is abstracted away from the developer.



            A google search for binary data schema language turns up a number of tools. An example is Apache DFDL. There may already be UI for this as well.






            share|improve this answer





















            • 1





              This feature is not reserved to 'ancient' era languages. C and C++ structs and unions can be memory aligned. C# has StructLayoutAttribute, which I have use to transmit binary data.

              – Kasper van den Berg
              11 hours ago






            • 1





              @KaspervandenBerg Unless you are saying that C and C++ added these recently, I consider that the same era. The point is that these formats were not simply for data transmission, though they were used for that, they mapped directly to how the code worked with data in memory and on disk. That's not, in general, how newer languages tend to work though they may have such features.

              – JimmyJames
              11 hours ago














            3












            3








            3







            The first step to doing this is that you need a way to find or define a grammar that describes structure of the data i.e. a schema.



            An example of this is a language feature of COBOL which is informally known as copybook. In COBOL programs you would define the structure of the data in memory. This structure mapped directly to the way the bytes were stored. This is common to languages of that era as opposed to common contemporary languages where the physical layout of memory is an implementation concern that is abstracted away from the developer.



            A google search for binary data schema language turns up a number of tools. An example is Apache DFDL. There may already be UI for this as well.






            share|improve this answer















            The first step to doing this is that you need a way to find or define a grammar that describes structure of the data i.e. a schema.



            An example of this is a language feature of COBOL which is informally known as copybook. In COBOL programs you would define the structure of the data in memory. This structure mapped directly to the way the bytes were stored. This is common to languages of that era as opposed to common contemporary languages where the physical layout of memory is an implementation concern that is abstracted away from the developer.



            A google search for binary data schema language turns up a number of tools. An example is Apache DFDL. There may already be UI for this as well.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 12 hours ago

























            answered 12 hours ago









            JimmyJamesJimmyJames

            13.1k2351




            13.1k2351








            • 1





              This feature is not reserved to 'ancient' era languages. C and C++ structs and unions can be memory aligned. C# has StructLayoutAttribute, which I have use to transmit binary data.

              – Kasper van den Berg
              11 hours ago






            • 1





              @KaspervandenBerg Unless you are saying that C and C++ added these recently, I consider that the same era. The point is that these formats were not simply for data transmission, though they were used for that, they mapped directly to how the code worked with data in memory and on disk. That's not, in general, how newer languages tend to work though they may have such features.

              – JimmyJames
              11 hours ago














            • 1





              This feature is not reserved to 'ancient' era languages. C and C++ structs and unions can be memory aligned. C# has StructLayoutAttribute, which I have use to transmit binary data.

              – Kasper van den Berg
              11 hours ago






            • 1





              @KaspervandenBerg Unless you are saying that C and C++ added these recently, I consider that the same era. The point is that these formats were not simply for data transmission, though they were used for that, they mapped directly to how the code worked with data in memory and on disk. That's not, in general, how newer languages tend to work though they may have such features.

              – JimmyJames
              11 hours ago








            1




            1





            This feature is not reserved to 'ancient' era languages. C and C++ structs and unions can be memory aligned. C# has StructLayoutAttribute, which I have use to transmit binary data.

            – Kasper van den Berg
            11 hours ago





            This feature is not reserved to 'ancient' era languages. C and C++ structs and unions can be memory aligned. C# has StructLayoutAttribute, which I have use to transmit binary data.

            – Kasper van den Berg
            11 hours ago




            1




            1





            @KaspervandenBerg Unless you are saying that C and C++ added these recently, I consider that the same era. The point is that these formats were not simply for data transmission, though they were used for that, they mapped directly to how the code worked with data in memory and on disk. That's not, in general, how newer languages tend to work though they may have such features.

            – JimmyJames
            11 hours ago





            @KaspervandenBerg Unless you are saying that C and C++ added these recently, I consider that the same era. The point is that these formats were not simply for data transmission, though they were used for that, they mapped directly to how the code worked with data in memory and on disk. That's not, in general, how newer languages tend to work though they may have such features.

            – JimmyJames
            11 hours ago


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Software Engineering Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f385623%2fhow-you-debug-a-binary-format%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            GameSpot

            日野市

            Tu-95轟炸機