0 Liked

    Update your Linux driver for Areca ARC-1883 SAS RAID Adapter

    Recently, when performing planned test scenarios with different hardware parts, our QA team identified an issue with kernel panic during read operations on the Areca ARC-1883 SAS RAID Adapter. We notified Areca and thanks to their fast reaction we were able to quickly resolve the problem. Here’s an overview.

    The problem

    During sequential read operations kernel panic occurred on Linux. As it turned out, the newer the kernel version the faster the system would hang.

    Call trace from dying system:

    BUG: unable to handle kernel paging request at ffff8800ffffffc8
    IP: [<ffffffffa01be89d>] arcmsr_drain_donequeue+0xd/0x70 [arcmsr]
    PGD 1a86063 PUD 0
    Oops: 0000 [#1] SMP
    last sysfs file: /sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.4/speed
    CPU 12
    Modules linked in: arcmsr(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tab]
    Pid: 3576, comm: dd Not tainted 2.6.32-431.11.2.el6.x86_64 #1 Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF
    RIP: 0010:[<ffffffffa01be89d>]  [<ffffffffa01be89d>] arcmsr_drain_donequeue+0xd/0x70 [arcmsr]
    RSP: 0018:ffff88089c483e38  EFLAGS: 00010082
    RAX: ffffc90016ea00c8 RBX: ffff8810731885e0 RCX: ffffc90016ea0020
    RDX: 0000000000000001 RSI: ffff8800ffffffb0 RDI: ffff8810731885e0
    RBP: ffff88089c483e48 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90016ea0030
    R13: 0000000000000008 R14: 0000000000000010 R15: 0000000000000001
    FS:  00007f7f733e5700(0000) GS:ffff88089c480000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: ffff8800ffffffc8 CR3: 0000000f2b97c000 CR4: 00000000001407e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process dd (pid: 3576, threadinfo ffff8809c7172000, task ffff8810712b8ae0)
    Stack:
    ffff88089c483e48 ffffffff81095628 ffff88089c483eb8 ffffffffa01beeff
    <d> 0000000000000005 ffff88089c483e90 ffff88107318acd8 ffffc90016ea0020
    <d> ffffc90016ea00c8 ffffc90016ea0030 0000000000000100 ffff88107026dc40
    Call Trace:
    <IRQ>
    [<ffffffff81095628>] ? schedule_work+0x18/0x20
    [<ffffffffa01beeff>] arcmsr_interrupt+0x5ff/0x6a0 [arcmsr]
    [<ffffffffa01befb1>] arcmsr_do_interrupt+0x11/0x20 [arcmsr]
    [<ffffffff810e6eb0>] handle_IRQ_event+0x60/0x170
    [<ffffffff8107a93f>] ? __do_softirq+0x11f/0x1e0
    [<ffffffff810e980e>] handle_edge_irq+0xde/0x180
    [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
    [<ffffffff8100faf9>] handle_irq+0x49/0xa0
    [<ffffffff815315fc>] do_IRQ+0x6c/0xf0
    [<ffffffff8100b9d3>] ret_from_intr+0x0/0x11
    <EOI>
    [<ffffffff81136bc9>] ? activate_page+0x189/0x1a0
    [<ffffffff81136bb9>] ? activate_page+0x179/0x1a0
    [<ffffffff81136c21>] mark_page_accessed+0x41/0x50
    [<ffffffff811213c3>] generic_file_aio_read+0x2c3/0x700
    [<ffffffff811c4841>] blkdev_aio_read+0x51/0x80
    [<ffffffff81188e7c>] ? do_sync_read+0xec/0x140
    [<ffffffff81188e8a>] do_sync_read+0xfa/0x140
    [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
    [<ffffffff812334d6>] ? selinux_file_permission+0x26/0x150
    [<ffffffff812335ab>] ? selinux_file_permission+0xfb/0x150
    [<ffffffff81226496>] ? security_file_permission+0x16/0x20
    [<ffffffff81189775>] vfs_read+0xb5/0x1a0
    [<ffffffff811975bd>] ? path_put+0x1d/0x40
    [<ffffffff811898b1>] sys_read+0x51/0x90
    [<ffffffff810e1e4e>] ? __audit_syscall_exit+0x25e/0x290
    [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
    Code: ff ff c6 02 00 48 8d 7a 01 40 b6 5f e9 5f ff ff ff 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 0f 1f 44 00 00 <4c> 8b 46 18 49 39 f
    RIP  [<ffffffffa01be89d>] arcmsr_drain_donequeue+0xd/0x70 [arcmsr]
    RSP <ffff88089c483e38>
    CR2: ffff8800ffffffc8

     

    Preliminary tests showed that the same scenario performed with an older kernel works longer but finally provides the same results – kernel panic.

    What was the solution?

    Our development team researched the driver and immediately informed Areca about the part of the code where this issue occurred. The issue was caused by getting the wrong Command Control Block pointer value in  the arcmsr_hbaC_postqueue_isr function.

    In the arcmsr_hbaC_postqueue_isr function of the old driver:

    flag_ccb = readl(&phbcmu->outbound_queueport_low);ccb_cdb_phy = (flag_ccb & 0xFFFFFFF0);/*frame must be 32 bytes aligned*/arcmsr_cdb = (struct ARCMSR_CDB *)(acb->vir2phy_offset + ccb_cdb_phy);ccb = container_of(arcmsr_cdb, struct CommandControlBlock, arcmsr_cdb);   <- ccb points to the wrong address

     

    Upon Areca’s request, our kernel developers also tested this scenario on different Linux systems (including Ubuntu and CentOS) and provided further information about the test results.

    Devices affected

    We tested other Areca Adapters, however, this behavior only occurred on the Areca ARC-1883 SAS RAID Adapter.

    Operation System affected

    All Linux-based systems that use the Areca driver in versions lower than 1.30.0X.18-140417 are affected.

    What to do if you want to use Open-E DSS V7 with the Areca ARC-1883 SAS RAID Adapter?

    Please contact our support team which already has a fix for this issue. Additionally, Areca prepared a driver which fixes the issue. [http://www.areca.com.tw/support/s_linux/linux.htm]

    All in all, we have to say that we were impressed with Areca’s technical support. They reacted very fast from the moment our QA team first reported the problem. Then, upon Areca’s request – our team further investigated the issue. Once the problem was identified, our technology partner Areca quickly prepared a driver fixing the issue. Great job guys!

    Rating: / 5.

    No votes yet

    Leave a Reply