• Artyom@lemm.ee
    link
    fedilink
    English
    arrow-up
    65
    arrow-down
    5
    ·
    9 months ago

    The idea that any scientist is doing data analysis in Excel is honestly terrifying on every level.

    • griffinsklow@feddit.de
      link
      fedilink
      English
      arrow-up
      18
      arrow-down
      1
      ·
      9 months ago

      I remember when a biologist asked us for help - Excel crashed on processing his 700MB tables. Took some time and Chatgpt to convince him to do the analysis in R. It worked out in the end and he is now recommending this solution to his colleagues, which is nice.

    • Evotech@lemmy.world
      link
      fedilink
      English
      arrow-up
      12
      arrow-down
      7
      ·
      9 months ago

      Excel is excellent at data analysis… Python integrations and everything

        • filcuk@lemmy.zip
          link
          fedilink
          English
          arrow-up
          12
          arrow-down
          1
          ·
          edit-2
          9 months ago

          Because every scientist is also a programmer?
          Especially if they struggle to use Excel properly, no chance.

    • Wooshock@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      10
      ·
      9 months ago

      What the hell else is there? Good luck getting universities using OpenOffice

      • asdfasdfasdf@lemmy.world
        link
        fedilink
        English
        arrow-up
        18
        arrow-down
        4
        ·
        edit-2
        9 months ago

        Scientists should be using programming languages like R or Python. They are both extremely popular in this field, much more than Excel.

          • Hawk@lemmynsfw.com
            link
            fedilink
            English
            arrow-up
            4
            arrow-down
            3
            ·
            9 months ago

            Except every scientist and analyst. Stats, data sci and ML is done in R and Python, be it astro, health data or genomics.

            If someone has been taught stats in spreadsheet software, they have have been taught wrong, period.

            Also, programming is a very strong term. we’re talking about stats in a scripting language, not software development in CPP.

          • isles@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            2
            ·
            9 months ago

            Research projects almost exclusively have more than one person working on them.

  • macrocephalic@lemmy.world
    link
    fedilink
    English
    arrow-up
    33
    arrow-down
    1
    ·
    9 months ago

    Now if only it would stop dropping leading zeros unless you ask it, and we got rid of the MM/DD/yyyy date format entirely.

    • theparadox@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      9 months ago

      Now if only it would stop dropping leading zeros unless you ask it

      That appears to actually be a feature.

      • EngineerGaming@feddit.nl
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        9 months ago

        I think the point was that the format itself is odd. I am European and it’s weird to me: logically it should be either from greatest to smallest, or from smallest to greatest, not a weird in-between.

  • MelodiousFunk@kbin.social
    link
    fedilink
    arrow-up
    29
    ·
    9 months ago

    Me before reading the article: It’s got to be dates. Excel thinks everything is a date.

    Me after reading the article: Even the workaround is halfhearted. Jeebus.

    • TwinHaelix@reddthat.com
      link
      fedilink
      English
      arrow-up
      12
      ·
      9 months ago

      Microsoft’s blog adds caveats, such as that Excel avoids the conversion by saving the data as text, which means the data may not work for calculations later. There’s also a known issue where you can’t disable the conversions when running macros.

  • Kethal@lemmy.world
    link
    fedilink
    English
    arrow-up
    27
    arrow-down
    1
    ·
    9 months ago

    Microsoft fixes one of the Excel features that wreck scientific data.

  • MonkderZweite@feddit.ch
    link
    fedilink
    English
    arrow-up
    26
    ·
    9 months ago

    20 years after the problem was first reported.

    Meaning there’s still hope for XDG support in Firefox?

  • JoBo@feddit.uk
    link
    fedilink
    English
    arrow-up
    26
    ·
    9 months ago

    It’s no good having this as part of the user options. It should be a sheet characteristic and the default should be “keep cells exactly as entered regardless of data type”.

    • kalleboo@lemmy.world
      link
      fedilink
      English
      arrow-up
      12
      arrow-down
      6
      ·
      edit-2
      9 months ago

      Changing the default will break the workflows of tens of thousands in the business industry

      Scientists should be using something like MATLAB, not Excel.

      • RheingoldRiver@kbin.social
        link
        fedilink
        arrow-up
        3
        ·
        9 months ago

        You could make a new filetype, default new versions to it, & not break compatibility. Wouldn’t do anything for existing workbooks, and keep xlsx an option, but “it would break compatibility” is not a be-all end-all argument against this.

      • JoBo@feddit.uk
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        9 months ago

        They’re not doing their analysis in Excel. MATLAB solves no problems here?

  • CatLikeLemming@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    24
    arrow-down
    5
    ·
    9 months ago

    This isn’t a fix. Excel wasn’t meant for this. While I do understand it’s convenient as a database, unless you’re doing something unimportant and small you just really should use something proper. And even now that this “problem” is gone, I am certain there are still more things that cause trouble. You can not satisfy everyone and Excel was just… not made for gene info storage.

    Even if you don’t want to use stuff that isn’t Microsoft Office, that comes with Microsoft Access, which is a proper database management system. It’s literally in the same software package, so why do people refuse to use it?

    • zalgotext@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      38
      arrow-down
      1
      ·
      9 months ago

      Why would you need a full blown (shitty) relational database management system to store gene info? Excel should be just fine for storing data in arbitrary tables. It shouldn’t make assumptions about your data by default, and changing values that look like they’re in a specific format should be opt-in, not default behavior.

      • SirQuackTheDuck@lemmy.world
        link
        fedilink
        English
        arrow-up
        8
        arrow-down
        1
        ·
        9 months ago

        It shouldn’t make assumptions about your data by default, and changing values that look like they’re in a specific format should be opt-in, not default behavior

        But that’s exactly what made the “auto” data type of Excel such a powerful tool when introduced. If you’re storing text, make the datatype “text”, problem solved.

        Nowadays, when making stuff like Excel from scratch, you could opt for a “these look like dates, change the type from ‘none’ to ‘date’?” but with middle management being conditioned on the data type being ‘auto’, that’s something that’s hard to change.

        • schnurrito@discuss.tchncs.de
          link
          fedilink
          English
          arrow-up
          25
          arrow-down
          1
          ·
          9 months ago

          Optimist: The glass is half full.

          Pessimist: The glass is half empty.

          Realist: The glass is twice as big as necessary.

          Excel: The glass is the 2nd of January.

        • CatLikeLemming@lemmy.blahaj.zone
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          2
          ·
          9 months ago

          Honestly, I’d say you shouldn’t do that prompt method. The auto type is genuinely great for the use cases which Excel is supposed to be used for, from someone managing their household finances to charting the growth of a business.

          By all means, it absolutely should make assumptions about your data by default, as that’s incredibly convenient for the average user. You can always change the type of a cell afterwards if what you’re doing is special.

      • CatLikeLemming@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        3
        ·
        9 months ago

        That is not what it was made for. It was made to do shenanigans with values like doing math on them and plotting graphs. If you merely want data storage, use a table. I agree, a database is overkill for most things, but that doesn’t change the fact that Excel is the wrong tool for the job. Maybe if they added a table mode where it’s basically just a frontend for a csv it’d work, but right now I’d still say it’s better to use a scalpel than a hammer, even if scissors do the trick just fine.

      • Hawk@lemmynsfw.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        9 months ago

        Sqlite and duckdb are great, I don’t know about shitty.

        You don’t get the visual feedback but the query language, reliability and python interface are all top notch.

      • CatLikeLemming@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        9 months ago

        I’ve never used Access personally, so I don’t know if it’s any good or not, I’m just frustrated by people using spreadsheets for data storage.

        • Evotech@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          9 months ago

          It’s been years since I used it tbh. But “access bad” is a meme for a reason

    • Echo Dot@feddit.uk
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      3
      ·
      9 months ago

      I’m so sick of people using Excel for things it’s not supposed to be used for.

      As a general rule if you’re not actually making use of the formula tool, you probably don’t need to be using Excel.

  • chepox@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    13
    ·
    9 months ago

    "Microsoft’s blog adds caveats, such as that Excel avoids the conversion by saving the data as text, which means the data may not work for calculations later. There’s also a known issue where you can’t disable the conversions when running macros. "

    This sounds very half assed…

  • detalferous@lemm.ee
    link
    fedilink
    English
    arrow-up
    13
    ·
    9 months ago

    From the article:

    The problem of Excel software (Microsoft Corp., Redmond, WA, USA) inadvertently converting gene symbols to dates and floating-point numbers was originally described in 2004 [1]. For example, gene symbols such as SEPT2 (Septin 2) and MARCH1 [Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase] are converted by default to ‘2-Sep’ and ‘1-Mar’, respectively. Furthermore, RIKEN identifiers were described to be automatically converted to floating point numbers (i.e. from accession ‘2310009E13’ to ‘2.31E+13’). Since that report, we have uncovered further instances where gene symbols were converted to dates in supplementary data of recently published papers (e.g. ‘SEPT2’ converted to ‘2006/09/02’). This suggests that gene name errors continue to be a problem in supplementary files accompanying articles.

  • neuropean@kbin.social
    link
    fedilink
    arrow-up
    11
    ·
    9 months ago

    Thank god! You have no idea how awful this is for scientists. Need to paste some gene names down? Better hope it’s not MARCHF8 or in the Septin gene family, otherwise you have to convert columns to text then import the data. Seems like a simple fix, but many wet lab biologists are technologically challenged.

  • AutoTL;DR@lemmings.worldB
    link
    fedilink
    English
    arrow-up
    3
    ·
    9 months ago

    This is the best summary I could come up with:


    In 2020, scientists decided just to rework the alphanumeric symbols they used to represent genes rather than try to deal with an Excel feature that was interpreting their names as dates and (un)helpfully reformatting them automatically.

    Yesterday, a member of the Excel team posted that the company is rolling out an update on Windows and macOS to fix that.

    Excel’s automatic conversions are intended to make it easier and faster to input certain types of commonly entered data — numbers and dates, for instance.

    But for scientists using quick shorthand to make things legible, it could ruin published, peer-reviewed data, as a 2016 study found.

    Microsoft detailed the update in a blog post this week, adding a checkbox labeled “Convert continuous letters and numbers to a date.” You can probably guess what that toggles.

    The update builds on the Automatic Data Conversions settings the company added last year, which included the option for Excel to warn you when it’s about to get extra helpful and let you load your file without automatic conversion so you can ensure nothing will be screwed up by it.


    The original article contains 225 words, the summary contains 184 words. Saved 18%. I’m a bot and I’m open source!

    • JackGreenEarth@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      4
      ·
      9 months ago

      Why are scientists using a paid service such as Excel anyway? Shouldn’t they be using something like Libre Open Office?

      • LogarithmicCamel@feddit.uk
        link
        fedilink
        English
        arrow-up
        4
        ·
        9 months ago

        You are completely right, and the Open Science movement is catching on. The idea is to give everyone access to the (anonymised) data and use only tools that are freely accessible, even to scientists from developing countries without Microsoft licenses, so that they too can rerun your analyses and verify your results. You shouldn’t be getting downvoted.

      • driving_crooner@lemmy.eco.br
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        9 months ago

        In college a professor gave us some homework to be done in excel, and as the nerd that I am, I asked if Livre Office was ok because I use Linux and have no access to Excel. The professor was like, well in that case everyone do the homework on R or python. My classmates were really mad at me for that.

      • Tavarin@lemmy.ca
        link
        fedilink
        English
        arrow-up
        1
        ·
        9 months ago

        I’ve had the same copy of excel since high school, and it’s done a damn fine job processing experimental date through undergrad, my PhD, and 6 years as a working researcher.

        It’s also the software pretty much everyone has, so you can easily share data with collaborators and other researchers. And it has a ton of functionality so you can process and analyze data easily, and create the visuals for papers very easily.

        • emergencyfood@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          9 months ago

          In science, it is important to have verifiable and replicable results. This means everything you use - from ingredients to software - should be transparent. We can’t examine Excel’s source code, so we don’t know if it is working as it claims to be. Most scientific disciplines are moving towards open source, open access etc., and you can’t use Excel in fields like physics or mathematical biology. But molecular biology is a bit of a holdout.

  • Etterra@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    3
    ·
    9 months ago

    Office Libre is free, and modern MS Office UIs looks like dog dookie. OL can also save in Excel format if you want.

    Hey look at that, I found a solution that didn’t require they change their entire process or have to wait for Microsloughed to get their act together.

  • Deebster@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    9 months ago

    It’s too late though, scientists already had to rename the genes. Although of course there are other things that can trigger it, not just in science.